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Preface 



The Japanese Society for Artificial Intelligence (JSAI) was established in July 
1986. Since then, we have held conferences every year. Although JSAI is the 
second largest community in the world focusing on the area of Artificial In- 
telligence and we have over 3,000 members, the importance of the research 
presented and discussions held at the annual conferences has not been fully 
recognized in the Artificial Intelligence communities elsewhere in the world, 
partly because most presentations are made in the Japanese language. The- 
refore, the program committee of the Fifteenth Annual Conference of JSAI 
decided to open the door to the world and hold international workshops du- 
ring the conference on May 20th and 25th, 2001 in Matsue City, Japan. 

The workshop proposals were gathered from the members of JSAI. We 
accepted the following up-to-date and exciting topics: 1) Social Intelligence 
Design chaired by Prof. Toyoaki Nishida, University of Tokyo, 2) Agent-Based 
Approaches in Economic and Social Complex Systems chaired by Prof. Akira 
Namatame, National Academy of Defense, 3) Rough Set Theory and Granular 
Computing chaired by Prof. Shusaku Tsumoto, Shimane Medical University, 
4) Chance Discovery chaired by Prof. Yukio Osawa, and 5) Challenge in 
Knowledge Discovery and Data Mining chaired by Prof. Takashi Washio, 
Osaka University. These workshops were highly welcome and successful. A 
total of 116 people in Japan and 30 researchers from abroad participated in 
them. 

This volume of the proceedings contains selected papers presented at the 
workshops. The contents of the volume are divided into five parts, each of 
which corresponds to the topics of the workshops. Each paper was strictly 
reviewed by the committee members of the workshops. They also cover recent 
divergent areas of artificial intelligence. We believe that the volume is highly 
useful for both researchers and practitioners who have interests in recent 
advances in artificial intelligence. 
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JSAI Workshops as International Trends 



Looking at the current economic, political, and ecological situations, we be- 
come aware of the dynamic environment surrounding all human activities. 
Hand in hand, the expansion of the World Wide Web is activating the whole 
globe as an information system including humans, computers, and networks. 

The workshop topics associated with JSAI 2001 were designed to hit such 
world wide trends. Social Information Designs are needed to aid the mu- 
tual progress of human society and various kinds of information flows. The 
Agent-Based Simulations consider social behavior from the aspect of eco- 
nomics, with the up-to-date viewpoint of complexity. Rough Set Theories 
may achieve a breakthrough with regard to dealing with uncertain real world 
events on the basis of established theories. Chance Discovery is a new direc- 
tion proposed by Japanese researches, for helping people and agents be aware 
of novel information, significant for their own decisions in dynamic environ- 
ments. KDD-Challengers are responding to requirements for new knowledge 
to be obtained from new data in new social situations. 

I am sure the selected papers from these first international workshops 
associated with JSAI will win the attention of people from several different 
areas of research, not only artificial intelligence but also social sciences and 
other areas looking into the future of human life. A piece of good news for 
those readers is that JSAI is becoming increasingly international, after many 
years as a semi-domestic Japanese AI community. With the foundation of five 
workshop themes this year, the new generation of AI researchers is finding 
new problems and new solutions in the creative atmosphere. On behalf of all 
the workshop organizers, I wish to draw readers’ attention to forthcoming 
international JSAI events. 

Before beginning the contents, let us express our gratitude to the great 
support given by the co-editors who organized each workshop, all authors 
and audiences, JSAI committee members, Shimane prefecture and Matsue 
city, and Jun’ichiro Mori of the University of Tokyo whose operations greatly 
aided this publication. 



October 2001 



Yukio Ohsawa 




Table of Contents 



Part I. Social Intelligence Design 



1. Social Intelligence Design — An Overview 

Toyoaki Nishida 3 

1.1 Introduction 3 

1.2 Horizon of Social Intelligence Design 4 

1.2.1 Methods of Establishing the Social Context 6 

1.2.2 Embodied Conversational Agents and Social Intelligence 6 

1.2.3 Collaboration Design 7 

1.2.4 Public Discourse 8 

1.2.5 Theoretical Aspects of Social Intelligence Design 8 

1.2.6 Evaluations of Social Intelligence 9 

1.3 Concluding Remarks 10 

2. FaintPop: In Tonch with the Social Relationships 

Takeshi Ohguro, Kazuhiro Kuwabara, Tatsuo Owada, and 

Yoshinari Shirai 11 

2.1 Social Intelligence Design for Communications 11 

2.2 In Touch with the Social Relationships 13 

2.3 Initial Experiment 16 

2.4 Conclusion and Related Works 17 

3. From Virtual Environment to Virtual Community 

A. Nijholt 19 

3.1 Introduction 19 

3.2 Towards Multi-user Virtual Worlds 19 

3.2.1 Interacting Embodied Personalities 20 

3.2.2 Embodied Personalities in Virtual Worlds 21 

3.3 Building a Theater Environment 23 

3.4 Interacting about Performances and Environment 24 

3.5 Towards a Theater Community 25 




X 



Table of Contents 



4. Collaborative Innovation Tools 

John C. Thomas 27 

4.1 Importance of Collaboration: Practical and Scientific 27 

4.2 New Technological Possibilities 29 

4.3 Work of the Knowledge Socialization Group 31 

5. Bricks & Bits & Interaction 

R. Fruchter 35 

5.1 Introduction 35 

5.2 Visibility, Awareness, and Interaction in Videoconference Space 36 

5.3 Mobile Learners in E-learning Spaces 39 

5.4 Emerging Changes Influenced by Bricks & Bits & Interaction 41 

6. A Distributed Multi-agent System for the Self-Evaluation 
of Dialogs 

Alain Cardon 43 

6.1 Introduction 43 

6.2 System General Architecture 44 

6.3 Representation of the Semantic of the Communication Act. . . 45 

6.4 Semantic Traits and Agents 46 

6.5 Aspectual Agent Organization 46 

6.6 The Emerging Meaning of the Communication: 

The Morphological Agent Organization 48 

6.7 Interpretation of the Morphological Organization: 

The Evocation Agents 49 

6.8 Conclusion 50 

7. Public Opinion Channel: 

A System for Augmenting Social Intelligence of a 
Community 

Tomohiro Fukuhara, Toyoaki Nishida, and Shunsuke Uemura 51 

7.1 Introduction 51 

7.2 Communication Costs 52 

7.3 POC Prototype System 53 

7.3.1 POC Server 53 

7.3.2 POC Client: POCViewer 54 

7.4 Evaluation 57 

7.5 Discussion 57 

7.5.1 Automatic Broadcasting System 57 

7.5.2 POC and Narrative Intelligence 58 

7.6 Conclusion 58 




Table of Contents 



XI 



8. Enabling Public Discourse 

Keiichi Nakata 59 

8.1 Introduction 59 

8.2 Enabling Individuals to Collect and Exchange Information 

and Opinions 60 

8.3 Raising Social Awareness through Position-Oriented 

Discussions 62 

8.3.1 Positioning-Oriented Discussion Interface 63 

8.4 Towards “Social Intelligence Design” 64 

8.5 Concluding Remark 65 

9. Internet, Discourses, and Democracy 

R. Luehrs, T. Malsch, and K. Voss 67 

9.1 Introduction 67 

9.2 Online Support for Democratic Processes 67 

9.3 A Novel Participation Methodology 69 

9.4 System Design 72 

10. How to Evaluate Social Intelligence Design 

Nobuhiko Fujihara 75 

10.1 Computer Networked Community as Social Intelligence 75 

10.2 The Importance of Control Condition in Evaluating Social 

Intelligence Design 76 

10.3 How to Evaluate POC 77 

10.4 Future Works 81 



Part II. Agent-Based Approaches in Economic and Social Complex 
Systems 



11. Overview 

Akira Namatame 85 

12. Analyzing Norm Emergence in Communal Sharing via 
Agent-Based Simulation 

Setsuya Kurahashi and Takao Terano 88 

12.1 Introduction 88 

12.2 Related Work on Studies of Norms 89 

12.3 Artificial Society Model TRURL 90 

12.3.1 Agent Architecture 90 

12.3.2 Communication and Action Energy 91 

12.3.3 Inverse Simulation 91 

12.4 Experiments 92 




XII Table of Contents 



12.4.1 An Amount of Information in Each Society 92 

12.4.2 Emergence and Collapse of a Norm 93 

12.4.3 Emergence and Control of Free Riders 94 

12.4.4 Information Gap 95 

12.4.5 Discussion 96 

12.5 Conclusion 97 

13. Toward Cumulative Progress in Agent-Based Simulation 

Keiki Takadama and Katsunori Shimohara 99 

13.1 Introduction 99 

13.2 Can We Assist Cumulative Progress? 100 

13.2.1 Problems in Agent-Based Approaches 100 

13.2.2 Points for Cumulative Progress 100 

13.2.3 Cumulative Progress in Current Projects 101 

13.3 Exploring Key Elements 101 

13.3.1 Interpretation by Implementation 102 

13.3.2 Applications of Ibl Approach 103 

13.4 Discussion 104 

13.4.1 Cumulative Progress 104 

13.4.2 Potential of Our Approach 105 

13.5 Conclusions 107 

14. Complexity of Agents and Complexity of Markets 

Kiyoshi Izumi 110 

14.1 Introduction 110 

14.2 The Efficient Market Hypothesis Seen from Complexity Ill 

14.3 Artificial Market Model 112 

14.3.1 Expectation 112 

14.3.2 Order 113 

14.3.3 Price Determination 113 

14.3.4 Learning 113 

14.4 Simulation Result 114 

14.4.1 Merit of Complicating a Prediction Formula 114 

14.4.2 The Demerit in the Whole Market 115 

14.4.3 Development of the Complexity of a Market 115 

14.5 New Efficient Market Hypothesis 118 

14.6 Conclusion 119 




Table of Contents XIII 



15. U-Mart Project: Learning Economic Principles from the 
Bottom by Both Human and Software Agents 

Hiroshi Sato, Hiroyuki Matsui, Isao Ono, Hajime Kita, Takao Terano, 
Hiroshi Deguchi, and Yoshinori Shiozawa 121 

15.1 Introduction 121 

15.2 Outlines of U-Mart System 122 

15.3 Outline of Open Experiment, Pre U-Mart 2000 123 

15.3.1 Open Experiment and Its Objectives 123 

15.3.2 Experimental System 123 

15.3.3 Configuration of Experiment 123 

15.4 Participated Agents and Their Strategies 123 

15.5 Experimental Result 126 

15.5.1 First Round 126 

15.5.2 Second Round 127 

15.5.3 Variety of Agents 127 

15.5.4 Reason of Heavy Rises and Falls 128 

15.6 Experiments with Human Agents 129 

15.7 Conclusion and Acknowledgements 130 

16. A Multi-objective Genetic Algorithm Approach to 
Construction of Trading Agents for Artificial Market 
Study 

Rikiya Fukumoto and Hajime Kita 132 

16.1 Introduction 132 

16.2 The U-Mart System 133 

16.3 Multi-objective Genetic Algorithms (MOCA) 133 

16.4 Construction of Trading Agents with a MOCA 134 

16.4.1 Structure of Trading Agents 134 

16.4.2 Implementation of MOCA 137 

16.5 Results of Experiments 139 

16.6 Conclusion 140 

17. Agent-Based Simulation for Economic and 
Environmental Studies 

Hideyuki Mizuta and Yoshiki Yamagata 142 

17.1 Introduction 142 

17.2 Agent-Based Simulation Framework: ASIA 143 

17.3 Market Simulation 145 

17.4 Dynamic Online Auctions 146 

17.5 Greenhouse Gas Emissions Trading 147 

17.6 Concluding Remarks 151 




XIV Table of Contents 



18. Avatamsaka Game Experiment as a Nonlinear Polya 
Urn Process 

Yuji Aruka 153 

18.1 Characteristics of Avatamsaka Game 154 

18.1.1 Synchronization 154 

18.1.2 A Two Person Game Form 155 

18.1.3 No Complementarities Except for Positive Spillovers 

to Be Found 156 

18.2 Avatamsaka Game Experiment as a Nonlinear 

Polya Urn Process 157 

18.2.1 The Elementary Polya Process 157 

18.2.2 A Generalized Polya Urn Process 158 

18.2.3 A Nonlinear Polya Process 160 

19. Effects of Punishment into Actions in Social Agents 

Keji Suzuki 162 

19.1 Introduction 162 

19.2 The Tragedy of the Common 163 

19.3 Coevolving Levy Plan and Payoff Prediction 164 

19.3.1 Approach 164 

19.3.2 Relation between Levy Plan and Payoff Prediction. . . . 165 

19.3.3 Reward of Agent and Incoming Levy of Meta-agent . . . 166 

19.3.4 Evaluation of Game 167 

19.3.5 Coevolution of Plan and Predictions 167 

19.4 Simulation 169 

19.4.1 Game without Meta-agent 169 

19.4.2 Simulations with Meta-agents 169 

19.5 Conclusion 172 

20. Analysis of Norms Game with Mutual Ghoice 

Tomohisa Yamashita, Hidenori Kawamura, Masahito Yamamoto, 
and Azuma Ohuchi 174 

20.1 Introduction 174 

20.2 Mutual Choice in Group Formation 175 

20.2.1 Norms Game with Mutual Choice 175 

20.2.2 Metanorms Game with Mutual Choice 177 

20.3 Simulation Setup 177 

20.4 Simulation 178 

20.4.1 Maintenance of Norm 178 

20.4.2 Establishment of Norm 180 

20.5 Conclusion 183 




Table of Contents XV 



21. Cooperative Co-evolution of Multi-agents 

Sung-Bae Cho 185 

21.1 Introduction 185 

21.2 Evolutionary Approach to IPD Game 186 

21.3 Cooperative Co-evolution of Strategies 187 

21.3.1 Forming Coalition 187 

21.3.2 Evolving Strategy Coalition 188 

21.3.3 Gating Strategies in Coalition 188 

21.4 Experimental Results 190 

21.4.1 Evolution of Strategy Coalition 190 

21.4.2 Gating Strategies 191 

21.5 Concluding Remarks 192 

22. Social Interaction as Knowledge Trading Games 

Kazuyo Sato and Akira Namatame 195 

22.1 Introduction 195 

22.2 Knowledge Transaction as Knowledge Trading Games 197 

22.3 Knowledge Trading as Symmetric and 

Asymmetric Coordination Games 198 

22.4 Aggregation of Heterogeneous Payoff Matrices 201 

22.5 The Collective Behavior in Knowledge Transaction 203 

22.6 Conclusion 206 

23. World Trade League as a Standard Problem for 
Multi-agent Economics Concept and Background 

Koichi Kurumatani and Azuma Ohuchi 208 

23.1 Introduction 208 

23.2 Concept of World Trade League 209 

23.3 Elements of World Trade League 210 

23.3.1 Behavior Options of Agents and Market Structure .... 210 

23.3.2 Game Settings and Complexity 211 

23.3.3 Evaluation Function of Players 212 

23.4 Implementation 212 

23.4.1 System Architecture 212 

23.4.2 Communication Protocol X-SS 213 

23.5 Requirements for Standard Problem in 

Multi-agent Economics 214 

23.6 Related Work 215 

23.7 Conclusion 216 




XVI Table of Contents 



24. Virtual Economy Simulation and Gaming 
— An Agent Based Approach — 

Hiroshi Deguchi, Takao Terano, Koichi Kurumatani, Taro Yuzawa, 

Shigeji Hashimoto, Hiroyuki Matsui, Akio Sashima, and 

Toshiyuki Kaneda 218 

24.1 Introduction 218 

24.2 Agent Based Simulation Model for Virtual Economy 219 

24.3 Result of Simulation 223 

24.4 Conclusion 225 

25. Boxed Economy Foundation Model: Model Framework 
for Agent-Based Economic Simulations 

Takashi Iba, Yohei Takabe, Yoshihide Chubachi, 

Junichiro Tanaka, Kenichi Kamihashi, Ryunosuke Tsuya, 

Satomi Kitano, Masaharu Hirokane, and Yoshiaki Matsuzawa 227 

25.1 Introduction 227 

25.2 Model Framework for Agent-Based Economic Simulations . . . 228 

25.3 Boxed Economy Foundation Model 228 

25.3.1 EconomicActor, SocialGroup, Individual 229 

25.3.2 Goods, Information, Possession 231 

25.3.3 Behavior, BehaviorManagement, Memory, Needs 232 

25.3.4 Relation, Path 232 

25.4 Applying Boxed Economy Foundation Model 233 

25.4.1 Modeling Behavior Rather than Agent 233 

25.4.2 Flexibility on the Boundary of Agent 233 

25.4.3 Example: Sellers in Distribution Mechanism 234 

25.5 Conclusion 235 



Part III. Rough Set Theory and Granular Computing 



26. Workshop on Rough Set Theory and Granular 
Computing Summary 

Shusaku Tsumoto, Shoji Hirano, and Masahiro Inuiguchi 239 

27. Bayes’ Theorem Revised The Rough Set View 

Zdzislaw Pawlak 240 

27.1 Introduction 240 

27.2 Bayes’ Theorem 241 

27.3 Information Systems and Approximation of Sets 242 

27.4 Rough Membership 244 

27.5 Information Systems and Decision Rules 244 

27.6 Probabilistic Properties of Decision Tables 245 

27.7 Decision Tables and Flow Graphs 246 




Table of Contents XVII 



27.8 Comparison of Bayesian and Rough Set Approach 247 

27.9 Conclusion 249 

28. Toward Intelligent Systems: Calculi of Information 
Granules 

Andrzej Skowron 251 

28.1 Introduction 251 

28.2 AR-Schemes 254 

28.3 Rough Neural Networks 255 

28.4 Decomposition of Information Granules 256 

29. Soft Computing Pattern Recognition: Principles, 

Integrations, and Data Mining 

Sankar K. Pal 261 

29.1 Introduction 261 

29.2 Relevance of Fuzzy Set Theory in Pattern Recognition 262 

29.3 Relevance of Neural Network Approaches 264 

29.4 Genetic Algorithms for Pattern Recognition 265 

29.5 Integration and Hybrid Systems 266 

29.6 Evolutionary Rough Fuzzy MLP 267 

29.7 Data Mining and Knowledge Discovery 268 

30. Identifying Upper and Lower Possibility Distributions 
with Rough Set Concept 

P. Guo and Hideo Tanaka 272 

30.1 Concepts of Upper and Lower Possibility Distributions 272 

30.2 Comparison of Dual Possibility Distributions with 

Dual Approximations in Rough Sets Theory 273 

30.3 Identification of Upper and Lower Possibility Distributions . . 274 

30.4 Conclusions 277 

31. On Fractals in Information Systems: The First Step 

Lech Polkowski 278 

31.1 Introduction 278 

31.2 Fractal Dimensions 278 

31.3 Rough Sets and Topologies on Rough Sets 279 

31.4 Fractals in Information Systems 280 

31.5 Conclusions 282 

32. Generalizations of Fuzzy Multisets for Including 
Infiniteness 

Sadaaki Miyamoto 283 

32.1 Introduction 283 




XVIII Table of Contents 



32.2 Multisets and Fuzzy Multisets 284 

32.3 Infinite Memberships 285 

32.4 A Set-Valued Multiset 286 

32.5 Conclusion 287 

33. Fuzzy c-Means and Mixture Distribution Model for 
Clustering Based on Xi-Space 

Takatsugu Koga, Sadaaki Miyamoto, and Osamu Takata 289 

33.1 Introduction 289 

33.2 Fuzzy c-Means Based on Li-Space 289 

33.3 Mixture Distribution Based on Li-Space 291 

33.4 Conclusion 293 

34. On Rough Sets under Generalized Equivalence Relations 

Masahiro Inuiguchi and Tetsuzo Tanino 295 

34.1 Introduction 295 

34.2 The Original Rough Sets 296 

34.3 Two Different Problem Settings 297 

34.4 Approximation by Means of Elementary Sets 298 

34.5 Distinction among Three Regions 298 

35. Two Procedures for Dependencies among Attributes in a 
Table with Non-deterministic Information: A Summary 

Hiroshi Sakai 301 

35.1 Preliminary 301 

35.2 Definitions of NISs 302 

35.3 A Way to Obtain All Possible Equivalence Relations 303 

35.4 Procedure 1 for Dependencies 303 

35.5 Procedure 2 for Dependencies 304 

35.6 Execution Time of Every Method 304 

35.7 Concluding Remarks 305 

36. An Application of Extended Simulated Annealing 
Algorithm to Generate the Learning Data Set for Speech 
Recognition System 

Chi-Hwa Song and Won Don Lee 306 

36.1 Introduction 306 

36.2 Domain Definition for LDS Extraction 306 

36.3 The Numerical Formula for LDS Extraction 307 

36.4 The Algorithm for Extraction of LDS 308 

36.5 Experimental and Result 309 

36.6 Conclusion 310 




Table of Contents XIX 



37. Generalization of Rough Sets with 
a-Coverings of the Universe Induced by 
Conditional Probability Relations 

Roily Intan, Masao Mukaidono, and Y.Y. Yao 311 

37.1 Introduction 311 

37.2 Conditional Probability Relations 312 

37.3 Generalized Rough Sets Approximation 313 

37.4 Conclusions 315 

38. On Mining Ordering Rules 

Y.Y. Yao and Ying Sai 316 

38.1 Introduction 316 

38.2 Ordered Information Tables 317 

38.3 Mining Ordering Rules 318 

38.4 Conclusion 320 

39. Non-additive Measures by Interval Probability Functions 

Hideo Tanaka, Kazutomi Sugihara, and Yutaka Maeda 322 

39.1 Introduction 322 

39.2 Interval Probability Functions 323 

39.3 Combination and Conditional Rules for IPF 325 

39.4 Concluding Remarks 326 

40. Susceptibility to Consensus of Conflict Profiles 

Ngoc Thanh Nguyen 327 

40.1 Introduction 327 

40.2 Conflict Profiles 327 

40.3 Susceptibility to Consensus 329 

40.4 Conclusions 331 

41. Analysis of Image Sequences for the Unmanned Aerial 
Vehicle 

Hung Son Nguyen, Andrzej Skowron, and Marcin S. Szczuka 333 

41.1 Introduction 333 

41.2 Data Description 334 

41.3 The Task 334 

41.4 The Method 334 

41.5 Results 335 

41.6 Conclusions 337 




XX Table of Contents 



42. The Variable Precision Rough Set Inductive Logic 
Programming Model and Web Usage Graphs 

V. Uma Maheswari, Arul Siromoney, and K.M. Mehata 339 

42.1 Introduction 339 

42.2 The VPRSILP Model and Web Usage Graphs 339 

42.2.1 A Simple-Graph- VPRSILP-ESD System 340 

42.2.2 Web Usage Graphs 340 

42.3 Experimental Illustration 341 

42.4 Gonclusions 343 

43. Optimistic Priority Weights with an Interval 
Comparison Matrix 

Tomoe Entani, Hidetomo Ichihashi, and Hideo Tanaka 344 

43.1 Introduction 344 

43.2 Interval AHP with Interval Gomparison Matrix 345 

43.3 Ghoice of Optimistic Weights and Efficiency by DEA 346 

43.3.1 DEA with Normalized Data 346 

43.3.2 Optimistic Importance Grades in Interval Importance 

Grades 346 

43.4 Numerical Example 347 

43.5 Goncluding Remarks 348 

44. Rough Set Theory in Conflict Analysis 

Rafal Deja and Dominik Sl^zak 349 

44.1 Introduction 349 

44.2 Gonflict Model 350 

44.3 Analysis 352 

44.4 Gonclusions 352 

45. Dealing with Imperfect Data by RS-ILP 

Ghunnian Liu and Ning Zhong 354 

45.1 Introduction 354 

45.2 Imperfect Data in ILP 355 

45.3 RS-ILP for Missing Glassification 356 

45.4 RS-ILP for Too Strong Bias 357 

45.5 Goncluding Remarks 357 

46. Extracting Patterns Using Information Granules: 

A Brief Introduction 

Andrzej Skowron, Jaroslaw Stepaniuk, and James F. Peters 359 

46.1 Introduction 359 

46.2 Granule Decomposition 359 




Table of Contents XXI 



47. Classification Models Based on Approximate Bayesian 
Networks 

Dominik Sl^zak 364 

47.1 Introduction 364 

47.2 Frequencies in Data 364 

47.3 Approximate Independence 365 

47.4 Bayesian Classification 366 

47.5 Approximate Bayesian Networks 367 

47.6 Conclusions 368 

48. Identifying Adaptable Components 
A Rough Sets Style Approach 

Yoshiyuki Shinkawa and Masao J. Matsumoto 370 

48.1 Introduction 370 

48.2 Defining Adaptation of Software Components 370 

48.3 Identifying One-to-One Component Adaptation 371 

48.4 Identifying One-to-Many Component Adaptation 373 

48.5 Conclusions 374 

49. Rough Measures and Integrals: A Brief Introduction 

Zdzislaw Pawlak, James F. Peters, Andrzej Skowron, Z. Suraj, 

S. Ramanna, and M. Borkowski 375 

49.1 Introduction 375 

49.2 Classical Additive Set Functions 376 

49.3 Basic Concepts of Rough Sets 376 

49.4 Rough Integrals 377 

49.5 Relevance of a Sensor 378 

49.6 Conclusion 378 

50. Association Rules in Semantically Rich Relations: 

Granular Computing Approach 

T. Y. Lin and Eric Louie 380 

50.1 Introduction 380 

50.2 Relational Models and Rough Granular Structures 380 

50.3 Databases with Additional Semantics 381 

50.4 Mining Real World or Its Representations 382 

50.5 Clustered Association Rules-Mining Semantically 383 

50.6 Conclusion 383 

51. A Note on Filtration and Granular Reasoning 

Tetsuya Murai, Michinori Nakata, and Yoshiharu Sato 385 

51.1 Introduction 385 

51.2 Preliminaries 385 




XXII Table of Contents 



51.3 Relative Filtration with Approximation 386 

51.4 Example of Granular Reasoning 388 

51.5 Concluding Remarks 389 

52. A Note on Conditional Logic and Association Rules 

Tetsuya Murai, Michinori Nakata, and Yoshiharu Sato 390 

52.1 Introduction 390 

52.2 Association Rules 391 

52.3 Previous Works 391 

52.4 Graded Conditional Logic 392 

52.5 Concluding Remarks 394 

53. Analysis of Self-Injurious Behavior by the LERS Data 
Mining System 

Rachel L. Freeman, Jerzy W. Grzymala-Busse, Laura A. Riffel, 

and Stephen R. Schroeder 395 

53.1 Introduction 395 

53.2 Data Mining 396 

53.3 Results 397 

53.4 Conclusions 398 

54. A Clustering Method for Nominal and Numerical Data 
Based on Rough Set Theory 

Shoji Hirano, Shusaku Tsumoto, Tomohiro Okuzaki, and 

Yutaka Hata 400 

54.1 Introduction 400 

54.2 Clustering Method 401 

54.2.1 Initial Equivalence Relation 401 

54.2.2 Modification of Equivalence Relations 402 

54.2.3 Evaluation of Validity 403 

54.3 Experimental Results 404 

54.4 Conclusions 404 

55. A Design of Architecture for Rough Set Processor 

Akinori Kanasugi 406 

55.1 Introduction 406 

55.2 Architecture 406 

55.2.1 Data Format 406 

55.2.2 Execution Process 407 

55.2.3 Discernibility Matrix Maker 407 

55.2.4 Core Selector 408 

55.2.5 Covering Unit 408 

55.2.6 Reconstruction Unit 408 




Table of Contents XXIII 



55.2.7 Implementation 409 

55.2.8 Performance Analysis 409 

55.3 Conclusion 410 



Part IV. Chance Discovery 



56. The Scope of Chance Discovery 

Yukio Ohsawa 413 

57. Chance Discovery Using Dialectical Argumentation 

Peter McBurney and Simon Parsons 414 

57.1 Introduction 414 

57.2 Argumentation 415 

57.3 The Discovery Agora: Formal Structure 417 

57.3.1 Discovery Dialogues 417 

57.3.2 Model of a Discovery Dialogue 418 

57.3.3 Dialogue Game Rules 420 

57.4 Conclusion 423 

58. Methodological Considerations on Chance Discovery 

Helmut Prendinger and Mitsuru Ishizuka 425 

58.1 Introduction 425 

58.2 Nature vs. Open Systems 426 

58.2.1 Prediction in the Natural Sciences 426 

58.2.2 Prediction in Open Systems 427 

58.3 Chance Discovery in Open Systems 427 

58.3.1 Enterprise Example 427 

58.3.2 The Limits of Regulatory Mechanisms 427 

58.3.3 Chance Discovery as Anticipation 428 

58.4 Chance Discovery, Uncertainty, Freedom 429 

58.4.1 Freedom 429 

58.4.2 Explaining versus Predicting 430 

58.5 Scientific Evaluation of Theories 430 

58.6 Chance Discovery vs. KDD 431 

58.7 Discussion and Conclusion 432 

59. Future Directions of Communities on the Web 

Naohiro Matsumura, Yukio Ohsawa, and Mitsuru Ishizuka 435 

59.1 Introduction 435 

59.2 Related Researches 436 

59.2.1 Discovery of Communities 436 

59.2.2 Discovery of Future Directions 437 

59.3 Future Directions of Communities 438 




XXIV Table of Contents 



59.3.1 How to Discover the Future Directions? 438 

59.3.2 The Detailed Process 439 

59.4 Experiments and Discussions 440 

59.4.1 Future Directions of Portal Sites 440 

59.4.2 Future Directions of Book Site 441 

59.4.3 Future Directions of Artificial Intelligence 442 

59.5 Conclusions 442 

60. A Document as a Small World 

Yutaka Matsuo, Yukio Ohsawa, and Mitsuru Ishizuka 444 

60.1 Introduction 444 

60.2 Small World 444 

60.3 Term Co-occurrence Graph 445 

60.4 Finding Important Terms 446 

60.5 Example 447 

60.6 Conclusion 448 

61. Support System for Creative Activity by Information 
Acquirement through Internet 

Wataru Sunayama and Masahiko Yachida 449 

61.1 Introduction 449 

61.2 Framework for Creative Activity 449 

61.2.1 User Discovers a Viewpoint of the Combination 450 

61.2.2 Support System for Search Systems 450 

61.2.3 Data Mining from Web Pages 451 

61.2.4 Interface for Knowledge Refinement 451 

61.3 Experimental System 452 

61.4 Conclusion 453 

62. An Approach to Support Long-Term Creative Thinking 
and Its Feasibility 

Hirohito Shibata and Koichi Hori 455 

62.1 Introduction 455 

62.2 System Overview 456 

62.3 Long-Term User Study 458 

62.3.1 Behavior Analysis on Pop-Up 458 

62.3.2 Effects and Open Problems 460 

62.4 Conclusions 460 

63. Chance Discovery by Creative Communicators Observed 
in Real Shopping Behavior 

Hiroko Shoji and Koichi Hori 462 

63.1 Introduction 462 




Table of Contents XXV 



63.2 Collecting Protocols of Actual Purchase Activities 463 

63.3 Analysis and Result 463 

63.3.1 Expected Reaction 463 

63.3.2 Unexpected Reaction 465 

63.3.3 Successful Chance Discovery with 

Unexpected Reaction 465 

63.4 Discussion 466 

64. The Role of Counterexamples in Discovery Learning 
Environment: Awareness of the Chance for Learning 

Tomoya Horiguchi and Tsukasa Hirashima 468 

64.1 Introduction 468 

64.2 Chance Discovery in Learning Environment 469 

64.3 How to Design Effective Counterexamples 470 

64.4 Designing ‘Visible’ Counterexamples 471 

64.5 Discussion 473 

65. Integrating Data Mining Techniques and Design 
Information Management for Failure Prevention 

Yoshikiyo Kato, Takehisa Yairi, and Koichi Hori 475 

65.1 Introduction 475 

65.2 Fault Detection of Spacecraft by Mining Association Rules of 

Housekeeping Data 476 

65.3 Managing Information for Failure Prevention 477 

65.3.1 Using Design Information for Failure Prevention 477 

65.3.2 Design Information Repository 479 

65.3.3 Handling Anomalies 480 

65.4 Current Work and Conclusions 480 

66. Action Proposal as Discovery of Context 
(An Application to Family Risk Management) 

Yukio Ohsawa and Yumiko Nara 481 

66.1 Introduction : Which Opinions Grow into Consensus ? 481 

66.2 KeyCraph for Noticing Consensus Seeds from Questionnaire . 482 

66.3 Family Perception of Risks and Opportunities 483 

66.3.1 The Results of KeyCraph 484 

66.3.2 Which Opinions Grew into Consensus? 485 

66.4 Conclusions 485 

67. Retrieval of Similar Time-Series Patterns for Chance 
Discovery 

Takuichi Nishimura and Ryuichi Oka 486 

67.1 Introduction 486 




XXVI Table of Contents 



67.2 Reference Interval-Free Active Search 487 

67.3 Experiments 488 

67.4 Summary 489 

68. Fuzzy Knowledge Based Systems and Chance Discovery 

Viceng Torra 491 

68.1 Introduction 491 

68.2 Fuzzy Knowledge Based Systems 492 

68.3 System Architecture 493 

68.4 Conclusions 494 



Part V. Challenge in Knowledge Discovery and Datamining 



69. JSAI KDD Challenge 2001: JKDDOl 

Program Chair: Takashi Washio 499 

70. Knowledge Discovery Support from a 
Meningoencephalitis Dataset Using an Automatic 
Composition Tool for Inductive Applications 

Hiromitsu Hatazawa, Hidenao Abe, Mao Komori, 

Yoshiaki Tachibana, and Takahira Yamaguchi 500 

70.1 Introduction 500 

70.2 Ontologies for Inductive Learning 501 

70.3 Basic Design of CAMLET 502 

70.4 A Case Study of Knowledge Discovery Support Using a 

Meningoencephalitis Dataset 503 

70.4.1 Learning Rules from the View of Precision 504 

70.4.2 Learning Rules from the View of Specificity 505 

70.5 Conclusions and Future Work 507 

71. Extracting Meningitis Knowledge by Integration of Rule 
Induction and Association Mining 

T.B. Ho, S. Kawasaki, and D.D. Nguyen 508 

71.1 Introduction 508 

71.2 LUPC: Learning Unbalanced Positive Class 508 

71.3 Finding Rules from Meningitis Data 509 

71.4 Conclusion 512 

72. Basket Analysis on Meningitis Data 

Takayuki Ikeda, Takashi Washio, and Hiroshi Motoda 516 

72.1 Introduction 516 

72.2 Method for Selection and Discretization 517 




Table of Contents XXVII 



72.2.1 Algorithm 517 

72.2.2 Performance Measure 518 

72.3 Application 520 

72.4 Result and Expert’s Evaluation 521 

72.5 Conclusion 523 

73. Extended Genetic Programming Using Apriori 
Algorithm for Rule Discovery 

Ayahiko Niimi and Eiichiro Tazaki 525 

73.1 Introduction 525 

73.2 Genetic Programming 526 

73.3 Approach of Proposed Combined Learning 527 

73.4 Apply to Rule Discovery from Database 528 

73.4.1 ADF-GP Only 529 

73.4.2 Proposed Technique (Association Rules + ADF-GP) . . 529 

73.4.3 Discussion for the Results 531 

73.5 Conclusions 531 

74. Medical Knowledge Discovery on the 
Meningoencephalitis Diagnosis Studied by the Cascade 
Model 

Takashi Okada 533 

74.1 Introduction 533 

74.2 The Cascade Model 533 

74.3 Results and Discussion 535 

74.3.1 Computation by DISCAS 535 

74.3.2 Diagnosis 536 

74.3.3 Detection of Bacteria or Virus 538 

74.3.4 Prognosis 539 

74.4 Concluding Remarks 540 

75. Meningitis Data Mining by Cooperatively Using 
GDT-RS and RSBR 

Ning Zhong, Ju-Zhen Dong, and Setsuo Ohsuga 541 

75.1 Introduction 541 

75.2 Rule Discovery by GDT-RS 542 

75.2.1 GDT and Rule Strength 542 

75.2.2 A Searching Algorithm for Optimal Set of Rules 544 

75.3 Discretization Based on RSBR 546 

75.4 Application in Meningitis Data Mining 546 

75.5 Conclusion 547 

Author Index 549 

Subject Index 551 




1. Social Intelligence Design - An Overview 



Toyoaki Nishida 

Department of Information and Communication Engineering 
Graduate School of Information Science and Technology 
The University of Tokyo 

7-3-1 Kongo, Bunkyo-ku, Tokyo 113-8656, Japan 
nishida@kc.t.u-tokyo. ac.jp 



1.1 Introduction 

The advent of the Internet and information technology has brought about signifi- 
cant progress in augmenting the way people can interact with each other in a to- 
tally new fashion that was not possible in the past. Examples of new technologies 
include conversational agents that mediate people in getting to know and commu- 
nicate with each other, a collaborative virtual environment for large-scale discus- 
sions, personalized information tools for helping cross-cultural communication, 
interactive community media for augmenting community awareness and memory, 
to name just a few. 

Sometimes new technologies induce the emergence of a new language and life- 
style. For example, interactive multimedia websites are a new medium and 
probably even a new language, with interesting new conventions, and increasing 
adaptation to the support of communities. Japanese teenagers have developed a 
new language for use originally with beepers and now with mobile phones. These 
are both new mainstream real world developments that should be studied further, 
and could probably give some valuable insights. 

The theme of Social Intelligence Design is really an angle on the support of 
groups in pursuit of their goals, whether that is medical knowledge, stock trading, 
or teenage gossip. Social Intelligence Design gives some new life to Agent Tech- 
nology and Artificial Intelligence research in general in that humans are integral 
part of a big picture by shifting the focus, from building artifacts with the problem 
solving or learning ability, to designing a framework of interaction that leads to 
creation of new knowledge and relationship among participants. Promising appli- 
cation domains of Social Intelligence Design include collaborative environment, 
e-learning, knowledge management, community support systems, symbiosis of 
humans and artifacts, crisis management, and digital democracy. 

In what follows, 1 will overview major issues involved in Social Intelligence 
Design and attempt at structure them in a coherent story.' 



* The following description is indebted to the discussions at JSAI-Synsophy International 
Workshop on Social Intelligence Design, Matsue, Japan, May 21-22, 2001. 



T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 3-10, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 
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1.2 Horizon of Social Intelligence Design 

Social Intelligence Design is a discipline aimed at understanding and supporting 
social intelligence. Conventionally, social intelligence has been discussed in the 
context of an individual’s ability, e.g., an ability to be able to manage relationship 
with other agents and act wisely in a situation governed by an implicit or explicit 
set of shared rules, based on an ability of monitoring and understanding of other 
agents’ mental state. It is distinguished from other kinds of intelligence such as 
problem solving intelligence (ability to solve logically complex problems) or 
emotional intelligence (ability to monitor one’s own and others’ emotions and to 
use the information to guide one’s thinking and actions). 

Alternatively, social intelligence might be attributed to a collection of agents 
and defined as an ability to manage complexity and learn from experiences as a 
function of the design of social structure. This view emphasizes the role of social 
rules or culture that constrain the way individual agents behave. We might attrib- 
ute a good social behavior to a good social structure and consider that a good so- 
cial structure affords the members of the community to learn from each other. 

In Social Intelligence Design, we intermingle these two views and look at both 
sides of social intelligence. The "social intelligence as an individual’s ability" 
view is related to designing a personal assistance or socially intelligent agents. On 
the other hand, the "social intelligence as a collective ability" view is concerned 
with the design of group/community support systems. 

Social Intelligence Design is truly an interdisciplinary field. The engineering 
aspects of Social Intelligence Design involve design and implementation of sys- 
tems that range from group/team oriented collaboration support systems that fa- 
cilitate intimate, goal-oriented interaction among participants, to community sup- 
port systems that support large-scale online-discussion. The scientific aspects of 
Social Intelligence Design are concerned with cognitive and social psychological 
understanding of social intelligence. In addition, economy, sociology, ethics and 
many other disciplines constitute the foundation of Social Intelligence Design. 
Engineering approaches should be tightly coupled with sociological and cognitive 
approaches to predict and assess the effects of social intelligence augmentation 
systems on the human society. On the other hand, novel insights may be obtained 
in sociology, cognitive psychology and other humanity studies by investigating a 
new virtualized society where humans and artifacts cohabit. 

Typical applications of Social Intelligence Design are group/team support sys- 
tems and community support systems. Community support systems provide rather 
long-range, bottom-up communicative functions in the background of daily life. 
Major issues are: (i) exchanging awareness with other members, (ii) exploring 
human and knowledge networks, (iii) building community knowledge, (iv) or- 
ganizing public events, (v) forming a group/team for collaborative work, (vi) 
helping negotiate with others, and (vii) supporting public discussions and decision 
making about the community. In contrast, group/team support systems focus on 
facilitating more intimate collaboration among members. Thus, group/team sup- 
port systems emphasize more task-driven, short-range collaboration, although 
awareness is equally emphasized. 
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Table 1. Horizon of Social Intelligence Design 

• methods of establishing the social context 

- awareness of connectedness [16] 

- circulating personal views [6] 

- sharing stories [20] 

• embodied conversational agents and social intelligence 

- knowledge exchange by virtualized egos [8] 

- conversational agents for mediating discussions [18] 

- a virtual world habited by autonomous conversational agents [15] 

- social learning with a conversational interface [9] 

- conversations as a principle of designing complex systems [7] 

- artifacts capable of making embodied communication [19] 

• collaboration design 

- integrating the physical space, electronic content, and interaction [3] 

- using multi agent system to help people in a complex situation [2] 

- evaluating communication infrastructure in terms of collaboration support [11] 

• public discourse 

- visualization [14] 

- social awareness support [14] 

- integrating Surveys, Delphis and Mediation for democratic participation [10] 

• theoretical aspects of social intelligence design 

- understanding group dynamics of knowledge creation [1] 

- understanding consensus formation process [13] 

- theory of common ground in language use [17] 

- attachment-based learning for social learning [12] 

• evaluation of social intelligence 

- network analysis [5] 

- hybrid method [4] 



The scope of Social Intelligence Design as a discipline of understanding and 
supporting social intelligence is summarized in Table 1. On the one hand. Social 
Intelligence Design is concerned with design and implementation of novel com- 
munication means for mediating interaction among people and agents. The scope 
ranges from preliminary and preparatory interactions among people such as 
knowing who’s who, to more intimate interaction such as collaboration. Support- 
ing a group formation, collaboration, negotiation, public discussion or social 
learning is considered to be an important application of Social Intelligence Design. 
Theoretical aspects, as well as pragmatic aspects, should be taken into account in 
designing, deploying, and evaluating social intelligence support tools. In the rest 
of this section, I will overview major issues in Social Intelligence Design. 
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1.2.1 Methods of Establishing the Social Context 

The common ground need to be established in order for social intelligence to 
emerge from interaction among agents, especially when the agents are located in a 
geographically distant places. A sub field of Social Intelligence Design is devoted 
to the design of a new communication medium for a community or a group. The 
role of communication medium is not only to meet primary communication goals, 
i.e., transmitting an intended content, but also providing contextual information 
that may help interpret the content. It is often the case in our daily life that con- 
versation is not for achieving higher-level goals such as information seeking, but 
merely for social interaction such as maintaining human relation. Such social in- 
teraction is important to constitute a social context such as trust. 

One approach is to support social awareness. Ohguro proposes to support the 
awareness of connectedness with FaintPop, which is a nonverbal communication 
device similar to a photo frame [16] in which small photos or icons of the user’s 
colleagues are displayed. FaintPop allows the user to communicate her/his feeling 
towards her/his colleagues by using the three types of touching (a tap to commu- 
nicate a neutral feeling, a pet a positive feeling, and a hit a negative feeling). 

In contrast, one may design a verbal communication medium to exchange more 
explicit information. The Public Opinion Channel (POC) [6] is a community-wide 
interactive broadcasting system. A POC continuously collects messages from 
people in a community and broadcasts summarized messages to the community. 
POC is not intended to be a system that broadcasts public opinions per se. In- 
stead, it is intended to broadcast people’s personal views arising in a daily life, 
e.g., questions, stories, findings, jokes, or proposals. These messages are consid- 
ered to form a social context that can serve as a basis of public opinion formation. 

IBM’s WorldJam [20] is a large-scale corporate-wide discussion wherein all 
IBMers worldwide are invited to participate in. The system provides an interface 
that allows each participant to quickly see the concurrent view of who else is pres- 
ent and which topics are being discussed. Thomas suggests that keys to innovate 
are with designing interface that can (i) facilitate engagement, (ii) allow the user to 
bring to bear necessary skills, talents, and knowledge sources on the problem, and 
(hi) use appropriate representations of the situation. He also points out the im- 
portance of stories and organizational issues. Stories allow the user to associate 
the content with previous experience. The organizational structure consisting of 
such people as moderators and facilitators plays a critical role in the WorldJam 
large-scale discussion experiment. 



1.2.2 Embodied Conversational Agents and Social Intelligence 

Conversation plays varieties of roles in human societies. It not only allows people 
to exchange information in a casual fashion, but it also helps them create new 
ideas or manage human relations. 

Embodied conversational agents can be used to augment social intelligence by 
mediating conversations among people. Kubota and Nishida use a talking- 
virtualized-egos metaphor in EgoChat [8] to enable a sophisticated asynchronous 
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communication among community members. A virtualized ego mainly plays two 
functions. First, it stores and maintains the user’s personal memory. Second, it 
presents the content of the personal memory on behalf of the user at appropriate 
situations. A virtualized ego serves as a portal to the memory and knowledge of a 
person. It accumulates information about a person and allows her/his colleague to 
access the information by following an ordinary spoken-language conversation 
mode, not by going up and down a complex directory in search for possibly exis- 
tent information, or by deliberately issuing commands for information retrieval. 
In addition, virtualized ego may embody tacit and non-verbal knowledge about the 
person so that more subtle messages such as attitude can be communicated. Ta- 
kahashi and Takeda use avatar-like conversational agents in a similar vein [18]. 
The user can use her/his agent to give comments on a web page. It extends col- 
laborative annotation in such a way that the users can encode subtle feelings in 
emotional expressions of agents. 

In a more sophisticated applications, building a rich conversational environ- 
ment becomes more important. Nijholt argues building a theater environment that 
provides the user with an information-rich virtual environment mimicking real 
theater buildings in a real town where autonomous agents with varying abilities 
cohabit [15]. The theater environment allows the user to be immersed in the vir- 
tual world and follow continuous verbal/nonverbal interactions with agents. He is 
introducing the internal model of autonomous agents in terms of beliefs, desires, 
plans, and emotions, to realize a theater community. 

Conversational characters can also be employed in the learning environment. 
Kaktus is a computer game environment that is designed so that the teenager stu- 
dent can interact with semi-autonomous emotionally intelligent characters to learn 
socio-emotional relations [9]. The notions of conversations and social intelligence 
are useful in designing complex systems. Goguen suggests experimenting with an 
appropriate blend of interaction metaphors in building interfaces to theorem 
pro vers [7]. 

In the real world applications, more issues such as embodiment should be taken 
into account. Terada and Nishida discuss designing an artifact capable of making 
embodied communication with people and other agents [19]. An interesting issue 
is how one can allow agents with different embodiment to communicate with each 
other. 



1.2.3 Collaboration Design 

Collaboration design is concerned with goal-oriented, more intimate interaction. 
In addition to basic communication facilities, the nature of interaction in collabo- 
rative activities should be studied in detail. 

Principles and guidelines are necessary to design collaboration support systems. 
Fruchter points out that it is beneficial to consider in terms of three perspectives, 
namely, physical spaces ("bricks"), electronic content ("bits") and the way people 
communicate with each other ("interaction") [3]. She suggests that by properly 
understanding the relationship between bricks, bits, and interaction, one can de- 
sign spaces that better afford communicative events, develop collaboration tech- 
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nologies that can best support the joint activities of people, and engage people in 
rich communicative experiences that enable them to immerse in their activity. 

Sometimes, e.g., in the case of emergence, it is desirable for a collection of 
people to be guided by socially intelligent agents in order to avoid a panic. Car- 
don proposes to employ a hierarchy of multiple agents consisting of what he calls 
aspectual agents and morphological agents [2]. In order to cope with the outcome 
of unexpected structure, he emphasizes the importance of a mechanism that allows 
meaning to be dynamically generated in communication. 

The communication infrastructure may influence the way of distant collabora- 
tion. For example, replacing HDTV (High Definition TV) by normal video may 
make a qualitative difference in collaboration style. Mark and DeFlorio suggest 
that since the HDTV provides high-resolution image, people do not use exagger- 
ated gestures or movements to convey expression through the HDTV image, 
which was reported to happen in normal videoconferences [11]. 



1.2.4 Public Discourse 

A group/community/society as a whole has to make decision from time to time. 
Effective use of information and communication technologies are sought to sup- 
port public discussion and decision-making. Nakata argues that critical issues in 
designing a discussion support system are (i) ease of information access and pro- 
active information gathering, (ii) user-friendly access to a scientific analysis tool- 
kit, (iii) evaluation of deliberative states, and (iv) guiding discussion through dis- 
cussion and consensus generation models [14]. He also points out the importance 
of supporting individuals so that they can collect and exchange information and 
opinions. 

Information and communication technology might bring a novel participation 
and discussion scheme into democracy. Luehrs et al [10] attempt at combining 
survey techniques, delphi approaches, and mediation method into a new method- 
ology for on-line democratic participation and interactive conflict resolution. 
Their system integrates mass opinion polls, cyclical decision-making process ex- 
ploiting expert knowledge, and an open process of participative conflict resolu- 
tion, adapted from Surveys, Delphi, and Mediation, respectively. 



1.2.5 Theoretical Aspects of Social Intelligence Design 

Theories play several roles in Social Intelligence Design. In addition to their prin- 
cipal role of providing a framework for understanding phenomenon, theories tell 
us more direct implications such as guidelines of designing community/group 
support systems or an inventory of known pitfalls that should be taken into ac- 
count in system design. 

In social psychology, notorious examples such as groupthink (i.e., a phenome- 
non that collective creativity does not exceed individual creativity) or the hostility 
to out-groups (i.e., a group member has hostility to out-groups easily) are known 
to hinder effective knowledge creation in a networked community. Azechi classi- 
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fies the content of a message into dry and wet information [1], Dry information 
primarily contains logical linguistic information and constitutes the core of a mes- 
sage. In contrast, wet information is mainly nonlinguistic, meta-information inci- 
dental to the contents of the message. Azechi argues that community-wide discus- 
sion for achieving some practical goal should be made only with dry information, 
otherwise rational discussion will be hindered due to the pathology of a group. 
Matsumura addresses the consensus formation in networked communities [13]. 
Based on social psychological experiments, he has found that (i) minority mem- 
bers tend to overestimate the number of other members who share the same atti- 
tude, (ii) minority members tend to underestimate the attitude of other members, 
(iii) minority members who underestimate the proportion of the minority’s opinion 
tend to lose an intention to act. Such inaccuracy in cognition of opinion distribu- 
tion is called the false consensus effect. These observations should be taken into 
account in designing discussion support systems so that useful discussions can be 
expected by reflecting minority opinions. 

Theories of language use in interaction are relevant to establishing the common 
ground in collaboration. Rosenberg suggests that key issues are information inte- 
gration into a common ground, the relation between linguistic channels and 
shared knowledge, and the mechanism of retaining shared knowledge in the com- 
mon ground of different kinds of participant [17]. In the context of social learn- 
ing, Marlow and Peretti explore attachment-based learning comprising response 
imprinting and mimicry. They have built a learning environment to test the hy- 
pothesis [12]. 



1.2.6 Evaluations of Social Intelligence 

Social Intelligence Design is certainly an empirical study. We have to repeat the 
design-implement-evaluation cycle until we reach better systems. 

Network Analysis is a powerful means of evaluating or comparing empirical 
data. It provides us with a means for calculating various aspects of a given net- 
work in terms of centrality, density or cohesion. By comparing those features 
from one network against those from another, we can describe the similarity and 
difference in quantitative terms. Fujita has conducted a field trial and employed 
network analysis to show the effectiveness of their community support system [5]. 
Fujihara has also applied network analysis to a log collected from experiments 
with a POC prototype [6] for several months to see if POC actually facilitates 
community knowledge creation [4]. He also points out that network analysis 
alone is not enough to evaluate community support systems, and hence it should 
be combined with several other methods such as the user’s subjective analysis or 
log analysis. 
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1.3 Concluding Remarks 

Social Intelligence Design is a discipline aimed at understanding and supporting 
social intelligence. In this paper, I have overviewed major issues involved in So- 
cial Intelligence Design and attempted at structure them in a coherent story. The 
contemporary view of Social Intelligence Design consists of methods of estab- 
lishing the social context, embodied conversational agents, collaboration design, 
public discourse, theoretical aspects of social intelligence design, and evaluation 
of social intelligence. 
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We propose a tool called FaintPop. It is intended to be an alternative media 
that is suitable for a very light-weight, acknowledge-only, mode of communi- 
cation. Furthermore, it intuitively provides, through memories of communica- 
tion, a general overview of the communication activities. The tool is designed 
for a community, with which the sense of connectedness can be shared among 
members. Results from an initial experiment are reported briefly. 



2.1 Social Intelligence Design for Commnnications 

Although the IT (Information Technology) bubble is said to have burst, the 
Internet and IT remain essential and are experiencing continued significant 
advances. There are several evidences that support the trend, just to mention 
an example: Mobile phone services are rushing toward the 3G era in which 
more ubiquitous and broadband communications will be fully utilized. The 
trend shows that our lifestyle, as well as our society, is surely being impac- 
ted by the Internet and IT. It is hard nowadays to imagine to work, live, or 
communicate without the network. Now the important question is determi- 
ning what design will best augment social intelligence for the network age. 
More specifically, we focus on the communication environment for emerging 
networked societies, since communication is the very basis of the societies. 

In challenging the question, we first look at the problem that is currently 
appearing and would increase in the future. The problem. Communication 
Overflow [2.15], consists of two related subproblems. One is that our oppor- 
tunities for communications are much greater than ever before. This trend is 
sometimes so overwhelming that our communications become segmented into 
pieces, that we lose the general view on our own communication activities. 
The other is that we do not have enough variants of network communication 
media to support the various communication modes common in our daily 
lives. For example, current network communication media seem too heavy 
for simply saying “Hi,” which is a frequently-used communication mode in 
physical environments. Using non-suitable media requires cognitive load. 

The notion of “Communication Overflow” is closely related to the problem 
of “Information Overflow.” However, our focus is not information itself. In 
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other words, our primary focus is not on the “developer’s side,” which mainly 
addresses the tools and abilities offered by technologies. Instead, we focus on 
the experience of users [2.18]. In other words, our primary focus is on the 
“user’s side,” that mainly concerns how tools are used, in what situations, 
and by whom. Therefore we use the term “Communication Overflow,” to 
clarify that the problem is with the possible overflow on users’ opportunities 
and awareness on his/her own communication activities. 

To answer the above problem, we proposed a new communication en- 
vironment [2.14]. Awareness of Connectedness is the key notion for under- 
standing the environment. Here we focus on the awareness of (the commu- 
nication activities of) oneself. Moreover, to transmit and share the sense of 
connectedness (awareness of “connected” status with others) are also pri- 
mary concern. This is contrasted to the term “awareness” used in the area of 
groupware, in which awareness information of the other participants involved 
in the current communication is the central issue, where the information is 
to supplement the contents of communication (e.g., [2.7]). 

Two candidate tools for the environment are introduced. One is called the 
Indicator, which is intended to provide feedback of the user’s communication 
activities [2.4]. It provides a general overview of user’s communication activi- 
ties, which is easily lost in the current segmentation of communications. The 
other is called Gleams of People, which is a simple, intuitive interactive media 
that exchanges the presence and statuses of users [2.16]. It is designed to be 
an alternative communication media which is very light-weight and suitable 
for the acknowledge-only mode of communication. 

As the first tools for the new communication environment, both tools were 
designed for personal use, since the individuals’ awareness of communication 
and connectedness is fundamental to social communications. The aspects of 
communities and societies are not addressed directly by the tools, though 
they can be derived implicitly as participants of communications in indivi- 
duals’ communication activities. However since we mostly belong and act in 
communities, tools that address these aspects will also be needed. Therefore, 
in this paper, we introduce the third tool for the communication environment. 

The tool, called FaintPop, subsumes the functions of the two tools mentio- 
ned above, but is designed to be a media for a community. More specifically, 
it provides an alternative communication media that is very light-weight and 
suitable for the acknowledge-only mode of communication, through which 
the sense of connectedness will be shared across the community. Moreover, 
it provides the general overview of communication activities in the commu- 
nity. The tools works in a suggestive way [2.15]. That is, it does not provide 
logical analysis such as comment chains or statistics; Instead, a general view 
is provided that offers a more intuitive but vague picture of what’s is going 
on in the community. In this way, the tool will retain the social relationships 
among the community members. 
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2.2 In Touch with the Social Relationships 

We incorporated a scenario-based technique [2.3] in designing the tool. 

The scenario: This snapshot shows my old friends. Most of us live far apart, 

but our friendship might remain. Sometimes I feel like contacting them, 
but find it hard to do so. Isn’t odd to make a phone call or to write a 
letter without any “important” business? What I want is mere a slight 
touch that we still are the friends; Just a faint sense of connectedness 
would suffice. I wonder if I can do this using just this snapshot. 

FaintPop implements this scenario; It is a media for sharing the sense of 
connectedness in a community. Messages exchanged using this media are a 
sort of things that not so important to talk to, but worth expressing . That is, 
the communication established by this media is not about important business 
matters, but about feelings which are very important in social relationships. 
The communication does not involve written or spoken language, the more 
intuitive technique of touching is employed. Moreover, memories of commu- 
nication are summarized and represented graphically in an intuitive way. It 
gives the users a general view of what’s going on in the community. 

Figure 2.1 shows two FaintPop prototypes. It is a hardware device sha- 
ped to resemble a photo frame. Each member of the community has his/her 
own device, and all pictures are the same initially. All of them have networ- 
ked. Instead of using real photographs, small pictures (or icons) of faces of 
all members of the community (possibly extracted from the original photo) 
are displayed. Members can communicate each other by touching the ima- 
ges/icons of friends. Written or spoken language is not supported, because in 
case of contents-oriented mode of communication where such languages are 
involved, conventional media such as e-mail and phone are more suitable. In- 
stead, FaintPop is oriented to the very light-weight, acknowledge-only mode 
of communications, in which to notify some content is not the main objective 
but to share the sense of connectedness is the main purpose. 




Fig. 2.1. FaintPop 

prototype 
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Three touching types are provided. A tap to represent neutral feeling, a 
pet to represent more positive feeling, and a hit to represent rather negative 
feeling. Due to the limitation of the device, these three types are currently 
implemented as a click, a long click, and a double click, respectively. Most 
users easily learned to input the right type by touching the screen with his/her 
finger. Our current design choice is to offer just these three types. Weather 
these three are enough remains to be confirmed. Some studies indicate that six 
or more basic emotions exist [2.1]. However some of these emotions (e.g., fear 
and anger) are not appropriate for the light-weight, acknowledge-only mode 
of communication. Furthermore, providing too many types would confuse 
users, complicate the interface, and conflict with the objective of the media. 

Touching a picture of a friend means that one of the three feelings are 
passed to that friend. The touch is encoded and distributed (via the network 
connections) to all the members in the community, so that all members can 
share what is going on in the community. The sending of a touch is displayed 
in all member’s screen as an animation effect: A small ball travels from the 
sender to the recipient, with different colors and speeds according to the 
three types of touching. The picture of a friend who received a message with 
positive (negative) feeling blinks larger (resp. smaller) for a while. For neutral 
feeling, the picture oscillates for a while. Touching his/her own picture means 
to broadcast a message to all community members — In other word, the user 
calls out the community in that feeling. Figure 2.2 shows the animation effect 
that a user broadcasts a message with negative feeling. 

The tool is modeled after a photo frame so that it can be placed and em- 
bedded naturally in daily lives. Therefore, the interface should not annoy the 
user such as flashing the whole screen. However such non-disturbing design 
has a drawback that the user possibly miss the communication that being 
held. One solution is to use a faint sound to indicate that some activity re- 
lated to the user is occurring (for example, a message from another user is 
arrived) . Different sounds are used according to the three types of messages. 



Recent activities of each friend 




Traces of recent communications 



Fig. 2.2. Screen 
image of FaintPop. 

A broadcasting mes- 
sage, traces of com- 
munications and re- 
cent activities of 
users are shown. 
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Another technique is to provide memories of communications. In the back- 
ground of pictures, traces of animation effects, which corresponds to the com- 
munications held, are left. Moreover, the informations that which members 
have been actively communicated recently is indicated by the changing color 
of the bulls-eye surrounding the picture of the friends; It is represented as a 
pie chart that indicates which types of messages are sent by the user recently 
(Figure 2.2). These memories gradually disappear with time. This provides a 
general view of communications in the community. The feature well suits the 
nature of the tool; One of the typical use case is that the user would glance 
at the “photo frame” occasionally and notice that something had happened 
among the friends. Therefore the tool has the aspect of asynchronous commu- 
nication media, in addition to the aspect of synchronous communication by 
touches. Hence, memory retention periods range from hours to a day, longer 
than those of most (synchronous) communication media. 

The touches that users make are not only visualized as animation effects 
and memories of communication but affect the default locations of each pic- 
ture of friend. FaintPop holds the parameters of closeness, which are naturally 
asymmetric, between the friends. When the user touches a friend positively 
(negatively), the acquaintance parameter from the user to the friend is in- 
creased (resp. decreased). Then the picture of the friend is moved closer to 
(resp. apart from) that of the user. Therefore, the locations of pictures dis- 
played on a user’s “photo frame” represent the closeness from the user to the 
friends. A single touch triggers just a slight change. Again, this is the long 
term effect as so is similar to the memories of communication. 

The user can know the closeness between friends (or the closeness from 
friends to the user) by dragging the picture of friends (or self). When the 
picture of friend A is dragged to that of friend B, the picture of friend B res- 
ponds. If the acquaintance parameter from B to A is high (low), the picture of 
B moves close to (resp. apart from) the picture of A. Note that the acquain- 
tance parameters are asymmetric: Dragging the picture of B to that of A may 
cause different move. This effect ends when the user stops the dragging, and 
all the pictures are returned to their default locations. The dragging itself do 
not generate a message nor is shared among the friends, but the information 
that the user performed dragging is distributed. It is shown in the activity 
summary (pie chart) surrounding the picture of the user. 

Privacy and one-to-one communication issues are important but not ad- 
dressed directly by this media. It is because our main focus is to provide an 
alternative communication media that will allow the sense of connectedness 
to be shared in the community. For one-to-one communication, we have in- 
troduced another media called Gleams of People [2.16]. It might be desirable 
to integrate these media in the future. 
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2.3 Initial Experiment 

To verify whether the basic objectives of the tool were accomplished, we 
conducted a preliminary experimental study. To match our scenario (section 
2.2), 6 subjects of similar ages who knew each other, that reside in different 
office locations and belong to different work teams, were selected from our 
laboratory members. Before the experiment, the subjects were instructed the 
basic usage of the tool. However, the objective of the tool, as well as when 
and in what purpose they were supposed to use the tool, was not explained. A 
traffic log of the tool was collected during the experiment. After the one- week 
experiment, subjects were asked to answer a questionnaire, mainly on at what 
occasions they sent messages, and with what they expected to communicate. 

It was well accepted as an alternative communication media for 
a community. Communication using FaintPop was frequent than e-mail and 
phone calls: An average of 13.4 messages per subject per day. Moreover, it 
was reported that subjects would like to use a media like FaintPop with close, 
intimate friends, while they wouldn’t with non-close persons or bosses. Alt- 
hough the objective of the tool and our scenario were not instructed, subjects 
understood the nature and objective of the tool through the experiment. 

FaintPop was used as a very light-weight media for an acknowled- 
ge-only mode of communication. Subjects sent single message mainly to 
express casual greetings and simple replies (acknowledgment) to the message 
received. Broadcast messages were used mainly to express friendly greetings 
when their status change (e.g., “see you tomorrow”). Figure 2.3 shows the 
daily usage of FaintPop. In 10:00 period, the largest number of broadcast 
messages were sent: Subjects issued friendly greetings, saying good morning 
to the community. Subjects actively dragged the picture of friends in 15:00 
period: They were between tasks, and their moods changed (or they were try- 
ing to change). In 17:00 period, subjects actively sent single messages, trying 
to change his/her mood by expressing casual greetings. The questionnaire in- 
dicated that “Around 1 7:00, I felt sympathy with friends that they also were 
taking a pause between tasks, because many friends actively used the tool. ” 

The general overview of the communication activities was ac- 
cepted positively. Memories of communications, both the pie charts of 
recent activities and the traces of communications, were accepted positively. 
However detailed opinions varied. For example, some subjects reported that 
too many traces lasted too long, others reported that traces disappeared too 
quickly. Therefore, there is room to refine the representation. 

It can retain the social relationships among the members of the 
community. The questionnaire replies indicated a slight improvement in the 
sense of closeness among the subjects, but it was not evident. However one 
subject reported: “I often used FaintPop when I heard a sound from it. I felt 
the sence of connectedness through the sounds, then confirmed the situation 
by watching the screen. Now the experiment is over and I miss the sounds. ” 
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Fig. 2.3. Total number of 
activities during the one- 
week experiment 



2.4 Conclusion and Related Works 

A tool called FaintPop is introduced, to demonstrate one answer to the pro- 
blem of communication overflow. It is a media designed for a community, 
with which the sense of connectedness can be shared among members. It 
is intended to be an alternative media that is suitable for very light-weight, 
acknowledge-only mode of communication. Furthermore, it provides, through 
memories of communication, a general overview of the communication activi- 
ties in the community. Results from initial experiment using the media are 
reported. It is suggested that the basic objectives of the tool are achieved. 
More long-term experiment will reveal the details on whether its objectives 
are accomplished and how people accept and use (or refuse) the media. 

For communities and groups, several visualization tools for communicati- 
ons have been studied [2.5, 2.10, 2.11]. However, they are sometimes oriented 
towards the logical, analytic aspect of the activities, or the communication 
media and the visualization are separated. On the other hand, FaintPop is 
intended to be a communication media that also offers intuitive visualiza- 
tion of (memories of) communication. There are several researches that try 
to support communities [2.8, 2.9, 2.13]. These studies are closely related to 
ours, however, most focus on the contents-oriented mode of communication. 

Several studies that use devices modeled after a photo frame are found. 
Kodak and StoryBox Network (www.storybox.com) started a service named 
Smart Picture Frame. While the sense of connectedness seems to be in its 
view, it merely shares pictures, not being a communication media nor using 
touches. In [2.12] the concept of digital picture frame is introduced. It tries to 
provide the visualization of everyday life activities of the person in the pic- 
ture by using icons on the frame. It intends to foster relationships between 
distributed families, and so is closely related to Familyware [2.6]. Though the 
objective is close to ours, their main concern is sensing and visualizing the 
status of a member. A light-weight communication media that uses photo 
frames and feathers is proposed [2.17]. inTouch is also a light-weight media 
using touches [2.2]. However, these works are basically for one-to-one com- 
munication, and memories of communication are not well supported. 
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3. From Virtual Environment to Virtual Community 
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3.1 Introduction 

We discuss a virtual reality theater environment and its transition to a virtual 
community by adding domain agents and by allowing multiple users to visit this 
environment. The environment has been built using VRML (Virtual Reality Mod- 
eling Language). We discuss how our ideas about this environment changed in 
time by adding more facilities to it and by paying more attention to potential users. 
Rather than a goal-directed information and transaction system, the environment is 
evolving into a virtual community where differences between visitors and artificial 
agents can become blurred. Before going into a description of our own environ- 
ment and its development we survey the research areas that now allow the build- 
ing of 3D embodied and animated agents that show intelligence and personality 
and that can inhabit our environment. 



3.2 Towards Multi-user Virtual Worlds 

The first networked virtual worlds were text-based. They became known as MUDs 
(Multi-User Domains) and they allowed communication between users and access 
to a shared database with text descriptions of users and objects. In these environ- 
ments the personality of a user shows in the contents and the style of the text ut- 
terances the user produces, his turn taking behavior and more generally the moods 
(as they show) and attitudes towards the community that can develop in such envi- 
ronments. Graphical multi-user environments were introduced in the 1980s. In a 
typical setting we have a background image showing the entrances to several lo- 
cations or rooms in the environment or we are in one of these 2D locations and we 
can choose one of the other visitors (or all of them) to talk to. Typically, visitors 
can present themselves by choosing an avatar (a 2D object) and its predefined 
animations. These animations are simple (a waving gesture, a jump of joy, . . .). 
Most interactions are text-based, by using chat windows and text balloons that ap- 
pear above the head of avatars that take part in the discussion. 

With the advent of VRML, virtual worlds could be designed for Worldwide 
Web. Rather than for chatting, the worlds were meant to be explored, to explain or 
to allow the simulation of a particular activity in which the visitor was involved. 
Virtual reality applications were already there and rather than consider distributed 
virtual reality as a technology to design communities it was explored for all kinds 
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of applications. Virtual worlds intended to meet other people entered the arena. In 
these worlds multiple visitors can share the scenes. In the more advanced worlds 
users can change parts of the world and can have sophisticated visual representa- 
tions that can interact without being restricted to predefined gestures. An avatar 
can be made to resemble the human user by photographic means. 

The worlds that we consider may have collision and gravity features that be- 
come visible in the movements of avatars. There can be real-time voice communi- 
cation and in addition there can be lip-sync facial gestures. Despite adding such 
features, there remains an enormous gap when we compare the capabilities of the 
avatars and talking heads with those of the humans they represent. One way to 
close this gap is to give the human user the ability to control the avatar in a much 
more detailed way. One possibility is to have them explicitly controlled online by 
the user and captured from verbal and non-verbal input or from body movements. 
Also, in addition to the avatars that represent humans we can add domain avatars 
to the environment to increase the sense of reality. They should be animated, but 
preferably there should be possibilities to give them personality and capabilities to 
act on their own or on behalf of a user of the avatar or owner of the environment. 
That is, they need appropriate internal modeling to allow autonomous behavior. 



3.2.1 Interacting Embodied Personalities 

Agent technology is a research field that emerged in the 1990’ s and that can be 
considered as a field in which actors are developed, although not necessarily in the 
context of human- computer interaction or virtual communities. Without going into 
details and especially controversial details, we want to mention properties of soft- 
ware modules that are generally assumed to be present before being ‘allowed’ to 
talk about them as agents: autonomy, reactive and proactive behavior and the 
ability to interact with other agents (or humans). For an agent to act appropriately 
in a domain it has been useful to distinguish beliefs (what the agent regards to be 
true, this may change in time), desires (the goals the agent has committed himself 
to) and the intentions (short-term plans that it tries to execute). 

Believability is a notion that has been emphasized by Joseph Bates, again in the 
early 1990’s. An agent is called believable, if some version of a personality shows 
in the interaction with a human. Two main theories on personality which can be 
used to design believable agents are trait theory, where personality is a set of psy- 
chological traits that characterizes a person’s behavior and social learning theory, 
where appraisal of the situation and the individual’s history are taken into account. 
Main requirements for believability are (Loyall [8]): personality, emotion, self- 
motivation, change, social relationships and consistency of expression. 

When we zoom in on the role of emotions, it should be mentioned that there are 
many subtleties involved when conveying them. Cartoon characters are allowed to 
exaggerate, giving more cues to the observer. Emotional cues shouldn’t be in con- 
flict with contextual cues. Emotional cues should be consistent during interaction; 
nevertheless they may change when interaction has taken place with the same user 
during a longer period, in time. Computational models from which emotional be- 
havior can be generated exist, but are not based on well-developed theory. There- 
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fore, rather than having emergent emotional behavior based on an agent’s cogni- 
tive appraisal model, we see applications in prototype (learning) environments 
with preprogrammed emotional display. 

Now that we have discussed reasonable, social, intelligent, believable and, in- 
deed, whatever kind of cognitive behavior, it is time to consider the role of em- 
bodiment. Embodiment allows more agent multimodality, therefore making inter- 
action more natural and robust. Several authors have investigated nonverbal 
behavior among humans and the role and use of nonverbal behavior to support 
human-computer interaction. See e.g. (Cassell [1]) for a collection of chapters on 
properties and impact of embodied conversational agents (with an emphasis on 
coherent facial expressions, gestures, intonation, posture and gaze in communica- 
tion) and for the role of embodiment (and small talk) on fostering self-disclosure 
and trust building. While the previous investigations we mentioned can be under- 
stood to emphasize the cognitive viewpoint of embodiment, we can as much em- 
phasize the possibility of an embodied agent to walk around, to point at objects in 
a visualized domain, to manipulate objects or to change a visualized (virtual) envi- 
ronment. In these cases the embodiment can provide a point of the focus for inter- 
action. From a technical point of view, extremely much has to be done on human- 
like (from a physical and cognitive point of view) agent behavior. From a domain 
point of view it has to be decided when and why such behavior is useful. 

Our next step is from embodiment to virtual humans. A list of research topics 
involved includes natural looking movement and deformation of visible body sur- 
face, animation of skeleton, hands and face, hair, skin and clothes representation, 
natural looking walking and grasping animation and, very importantly in the view 
of the previous topics, behavioral animation which strives at giving character and 
personality to the animation. This list of viewpoints can be complemented with 
viewpoints from cognitive and perceptory sciences. Virtual humans have to act in 
virtual environments where a visual, an auditory and a haptic/kinaesthetic envi- 
ronment intersect. 



3.2.2 Embodied Personalities in Virtual Worlds 

Agents are finding their way in virtual environments. The first applications of em- 
bodied agents can be found in training, simulation, education and entertainment. 
These environments may include a single agent with which the user can interact, 
but the user itself, or part of the user, can be included in the environment. In team 
training we can have several agents in the environment or several users are repre- 
sented in the environment. Research into crowd modeling also studies the behav- 
ior of groups of people in virtual environments. 

However, apart from these applications we also see developments where 2D 
and 3D extensions of chat worlds and digital cities become inhabited by embodied 
agents, both as representations of visitors and as autonomous domain agents. In 
the near future we can expect that companies, families or groups of people that 
share interests have the opportunity to design and use such environments. Below 
we mention a few projects in which these future developments become visible. 
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Several impressive research systems employing animated pedagogical agents 
have been built and are in a process of further development. Embodied pedagogi- 
cal agents can show how to manipulate objects, they can demonstrate tasks and 
they can employ gesture to focus attention. As such they can give more custom- 
ized advice in an information-rich environment. Lester et al. [7] use the term deic- 
tic believability for agents that are situated in a world that they co-inhabit with 
students and in which they use their knowledge of the world, their relative location 
and their previous actions to create natural deictic gestures, motions, and utter- 
ances. 

One example of an environment that employs embodied agents is the Soar 
Training Expert for Virtual Environments (STEVE, see Johnson et al. [5]). This is 
an immersive 3-D learning environment with a virtual agent called Steve. Steve 
demonstrates how to perform a physical, procedural task. It is a typical example of 
an environment where a student can get hands-on experience. Due to the student’s 
head-mounted display, Steve's perception module knows about the student’s posi- 
tion in the virtual world, about the student’s line of sight and which objects are in 
the student's field of view. Steve has been designed to support team training. 

A second example we want to mention is a BodyChat (Vilhjalmsson [14]), a re- 
search environment on conversational embodied agents. That is, there is not really 
a task to be performed or learned. People exchange information and chat. In this 
environment several users can have a conversation using the keyboard while their 
cartoon-like 3D animated avatars display corresponding salutations and turn tak- 
ing behavior. They look away during planning an utterance, they back-channel 
feedback and facial expression and look to the next speaker when ending. 
Watanabe [15] reports about similar research. Another system by Vilhjalmsson, 
called Situated Chat is in development. This system also animates avatars in an 
online graphical chat environment. However, since it knows about the shared vis- 
ual environment the generation of avatar movements can include referring ges- 
tures when making implicit or explicit references to the environment during the 
conversation. 

As a third range of examples we look at systems that have become known as 
interactive theater, where players connected by a network can take part in a per- 
formance as actors. There is a host server for the producer and there are client 
computers for the performers. The latter are represented as avatars in the virtual 
environment and with motion capture systems (cameras or sensors) avatar move- 
ments reflect player actions. Gestures, touch and facial expressions of the players 
can be tracked and given to the animation algorithms. The virtual stage may have 
actors that are provided by the theater and that show autonomous behavior ac- 
cording to some action patterns. They have a role, but the way they perform this 
role is also determined in interactions with the human players and their alter ego 
avatars. See Takahashi et al. [12] and Tosa et al. [13] for examples of interactive 
theater. 
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3.3 Building a Theater Environment 

The main theater building in our university town is called ‘Het Muziek Centrum’. 
It includes the usual rooms: performance halls, dressing rooms for artists, recrea- 
tional locations (for the audience and performers), wardrobes, etcetera. It also in- 
cludes a music academy. There are also some other theater buildings in the town. 
At this moment some of the buildings, their surroundings and the streets leading 
from one location to the other are being modeled in VRML and Java 3D. The vir- 
tual theater was built according to the design drawings of the architects of the real 
building. Originally the environment was built around an already existing natural 
language dialogue system that provides information about theater performances 
and that allows reservations to be made. In the virtual environment the dialogue 
system has been assigned to a visualized embodied agent. Once we had this agent 
and extended the environment, there grew the need to add other agents that were 
able to help the visitor. This raised our interest in having these agents communi- 
cate with each other as well and to endow them with some form of autonomous 
behavior. Rather than towards a goal-directed information and transaction system 
comparable to a voice-only telephone information system, the environment is now 
evolving into a virtual community where differences between visitors and artificial 
agents become blurred and where research topics show a wide variety including 
assigning personalities and emotions to artificial agents, usability studies involv- 
ing a navigational assistant, formal specification of (interactions in) virtual envi- 
ronments and reinforcement learning for agents in this multimodal environment to 
increase their autonomy. 

When we enter our Virtual Muziek Centrum, we see the information agent 
called Karin, waiting to tell us about performances, artists and available tickets. 
Visitors can explore this virtual environment, walking from one location to an- 
other, looking at posters, clicking on objects and so on. Karin can be asked natural 
language questions about performances in the theater. She has access to a database 
containing all the performances in the various theaters during the current season. 
Karin has a 3-D face that allows simple facial expressions and simple lip move- 
ments that are synchronized with a text- to- speech system mouthing the system’s 
utterances to the user (see Nijholt & Hulstijn [9] for details). Other agents have 
been introduced in this environment. For example, a navigation agent, that knows 
about the geography of the building and that can be addressed using typed in natu- 
ral language utterances. The visitor can ask the agent about existing locations in 
the theater. When the request is understood, a route is computed and the viewpoint 
in the world is guided along this route to the destination. The navigation agent has 
not been visualized as a 3D embodied agent. 

A Java based agent framework has been introduced to provide the protocol for 
communication between agents and the introduction of other agents. For example, 
why not allow the visitor to talk to the map of the seats in the main concert hall or 
to a poster displaying an interesting performance? In fact, we can have a multitude 
of potential and useful agents in our environment, where some just trigger an ani- 
mation, others can walk around and others have built-in intelligence that allows 
them to execute certain actions based on interactions with visitors. Some of the 3D 
avatars that live in our environment have not yet been incorporated in the frame- 
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work in a way that visitors can communicate with them (a baroque dancer, a piano 
player). We have been experimenting with embedding our environment in a multi- 
user shell (Reitmayr et al. [11]) that allows to entertain multiple visitors that can 
make themselves visible to each other as avatars (VRML objects). These avatars 
move along with the visitor, but they can also be assigned animations, intelligence 
and interaction abilities. Hence, we can have different human-like agents. Some of 
them are autonomous embodied agents standing or moving around in the virtual 
world and allowing interaction with visitors of the environment. Others represent 
human visitors of the environment. We want any visitor to be able to communicate 
with autonomous agents and visitors, whether visualized or not. That means we 
can have interactions between agents, between visitors, and between visitors and 
agents. This is a rather ambitious goal which cannot be realized yet completely. 



3.4 Interacting about Performances and Environment 

How does interaction between domain agents and visitors take place? We decided 
to introduce a model of natural language interaction between Karin and user that is 
rather primitive from a linguistic point of view, but sufficiently intelligent from a 
practical and pragmatic point of view. This natural language understanding system 
mediates between the user and a database containing information about perform- 
ances, artists and prices. Although the ‘linguistic intelligence’ is rather poor, the 
outcome of a linguistic analysis can be passed on to pragmatic modules that pro- 
duce relevant system responses in the majority of cases. The system prompts make 
users adapt their behavior to the system. Karin presents her information using text- 
to-speech synthesis and lip movements. When there are too many performances to 
read out, she presents a table and draws the user’s attention to this table using eye 
movement and a natural language utterance. The dialogue system can interpret and 
generate references to items in this table. 

It may be clear how to address Karin. However, visitors may want to address 
other domain agents and agents that represent users. As mentioned, this is work in 
progress. We are following several approaches to solve this problem. They are re- 
lated and can be integrated since all of them are agent-oriented and based on a 
common framework of communicating agents. In addition, we have built this 
framework in such a way that different agents with different abilities can become 
part of it: a simple animated piano player, a baroque dancer that ‘understands’ the 
music she is dancing on, Karin who knows about theater performances, and a 
navigation agent that knows about the geography of the building. 

Developing navigation agents leads to a number of questions. How can we 
build navigation intelligence into an agent? What does navigation intelligence 
mean? How can we connect this intelligence to language and vision intelligence? 
Visitors of our environment are language users and, moreover, they know and in- 
terpret what they see. There is a continuous interaction between verbal and non- 
verbal information when interpreting a situation in our virtual environment. This 
interaction and the representation and interpretation of sources and then the gen- 
eration of multimedia from them are among the main topics of our research. 
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We very much follow Darken & Silbert [3] in our approach to navigation. To 
assist the visitor in navigating through our virtual theater, we have added both a 
map and an intelligent navigation agent. The visitor can ask questions, give com- 
mands and provide information when prompted by the agent. This is done by typ- 
ing natural language utterances or by moving the mouse pointer over the map to 
locations and objects the user is interested in. On the map the user can find the 
performance halls, the lounges and bars, selling points, information desks and 
other interesting locations and objects. The current position of the visitor in the 
virtual environment is marked on the map. While moving in VR the visitor can 
check his or her position on this map. When using the mouse to point at a position 
on the map, references can be made by both user (in natural language) and system 
to the object or location pointed at. We have annotated a small corpus of example 
utterances that appear in navigation dialogues. An example of a question is: “What 
is this?” while pointing at an object on the map, or “Is there an entrance for wheel 
chairs?”. Examples of commands are “Bring me there.” or “Bring me to the in- 
formation desk.” Examples of short phrases are “No, that one.” or “Karin.” Erom 
the annotated corpus a grammar was induced and our unification-type parser for 
Dutch can be used to parse these utterances into feature structures. Three agents 
communicate to fill in missing information in the feature structure and to deter- 
mine the action that has to be undertaken (answering the question, prompting for 
clarification or missing information, displaying a route on the map or guiding the 
user in VR to a certain position). The navigation agent, the dialogue manager and 
the Cosmo Agent do this in co-operation. Not yet implemented is the possibility 
that not only the position but also what is in the eyesight of the visitor is known. 
This will allow interpretation of references to objects that are visible to a visitor. 



3.5 Towards a Theater Community 

The length of this paper does not allow a comprehensive survey of all the prob- 
lems we have to deal with when we want an agent-oriented design of our envi- 
ronment and have it inhabited by agents that can be embodied, have intelligence 
and personality and can communicate with each other and with agents that repre- 
sent visitors. To design and maintain an environment like that we need some uni- 
formity from which we can diverge in several directions: agent intelligence, agent 
interaction capabilities, agent visualization and agent animation (cf. Nijholt & 
Hondorp [10]). Standards are needed to allow frameworks for communication, 
internal modelling, and animation of embodied agents. These standards should 
also address issues concerned with multi-user and multi-developer environments. 

In Egges et al. [4] we introduce an approach to the internal modelling of agents 
we think we can use in our multi-agent and multi-user environment. Our approach 
discussed there, is limited, but nevertheless allows modeling of ‘intelligence’ in 
terms of beliefs, desires and plans, and possible extensions to the modeling of 
emotions and an agent’s knowledge about movements, postures and non-verbal 
communication. Our current emotion research is reported in Kesteren et al. [6] and 
Bui et al. [2]. 
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4.1 Importance of Collaboration: Practical and Scientific 

We live in an increasingly interconnected world. In reflection of this trend, the 
field of human-computer interaction has shifted focus from individuals to teams 
and large organizations [35]. From a scientific perspective, we learn most about 
the object of study during transitions. Thus, a learning test is generally more diag- 
nostic of brain function than a test of stored knowledge; a glucose tolerance test 
tells us more than a resting blood sugar level; a stress test reveals more about the 
heart than does resting heart rate. Similarly, this century’s rapid transitions should 
allow us to learn a great deal about collective human behavior. At the same time, 
we face enormous planetary problems including global fouling of the ecosphere, 
inequity in economic opportunity, increased chances for catastrophic disease, and 
international terrorism. These problems arose with current approaches and limita- 
tions to collaboration and will only be solved via breakthroughs in collaboration. 

From a more mundane viewpoint, similar challenges exist today for large, in- 
ternational organizations. For instance, the world is changing more quickly but 
creative design ability has not increased. As a result, there is a widening gap be- 
tween the degree of flexibility and creativity needed to adapt and the capacity of 
individuals and organizations to do so [12]. Design problems are often extremely 
high leverage for organizations. For instance, errors in design, whether in soft- 
ware, drugs, business processes, or automobiles are extremely costly. Conversely, 
effective and innovative designs can be extremely lucrative; are a hallmarks of 
long-lived companies [7, 10]. Even a modest increase in the ability of organiza- 
tions to create more effective designs could greatly increase profits in existing 
markets and create whole new markets. Increasing design effectiveness will re- 
quire collaboration breakthroughs. 

Human beings evolved natural language as a method of collaboration among 
small groups of people who generally shared context, goals, experience and cul- 
ture. Under those circumstances, sequential human speech served fairly well, e.g., 
the telling of stories for sharing experiences [34]. However, unaided speech is not 
well- suited to large-scale collaborations; particularly not when the people in- 
volved have vastly different assumptions, cultural backgrounds, goals, contexts, 
experiences and native languages. We have not yet invented an entirely effective 



T. Terano et at. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 27-34, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 




28 



J.C. Thomas 



replacement of natural language for large, diverse groups though storytelling can 
be useful in bridging gaps among groups when incorporated into the appropriate 
process [3, 4, 37]. Can we further extend such techniques to facilitate communi- 
cation among larger, more diverse groups? Or, should we limit such interactions to 
"dry" interactions [2]? 

One of the special challenges offered by collaboration today is that often it in- 
volves remote participants; sometimes, worldwide[25]. In many conversations and 
papers, an implicit assumption is that remote collaboration is limited by bandwidth 
alone and that the current superiority of face to face over remote collaboration will 
disappear once bandwidth becomes large enough. Such an analysis overlooks two 
additional and potentially quite important aspects of face to face collaboration. 

First, face to face collaboration allows people to see and experience the physi- 
cal and social context of their collaborators. Perhaps they see the building where 
others work; try the same food; find out whether they work in a quiet or noisy en- 
vironment; what the moods are of those that pass by in the hallways. Second, 
sharing an actual physical space allows the possibility of much deeper interaction 
and that possibility may well affect trust even if the possibility never materializes. 
Consider two rather extreme examples. First, two people sharing a physical space 
may be subject to a natural disaster such as an earthquake and one may save the 
life of the other. Although obviously a very low probability event, the mere possi- 
bility may well put people’s perceptual and emotional apparatus into a heightened 
state of arousal. Second, if two people share a common physical space, one could 
physically injure the other. Since A’s trust of B is enhanced by situations wherein 
A could hurt B but in fact, does not, the typical face to face interaction may en- 
hance trust in just this way. 

It is not only the medium and context of communication that impact collabora- 
tion, but also the content. In particular, we argue that expressive communication 
may offer an opportunity for collaborators to gain more comprehensive models of 
each other than instrumental communication alone. Instrumental communication 
is communication that is required to accomplish the current task. Expressive 
communication is communication that tells about the communicator as well as the 
subject; it is communicated more because the communicator wants to than be- 
cause they need to. 

Zheng, Bos, Olson, and Olson [38] showed that collaboration and trust can be, 
in effect, "jump-started" with social chitchat. Stories can also help people develop 
more trust than the exchange of information per se. A story is not simply an ob- 
jective recounting of events; it always implies a number of revealing choices. The 
storyteller chooses which events to talk about; where to start; tone; viewpoint; 
which details to describe and so on. Through such choices, the storyteller inevita- 
bly reveals themselves as well as the subject. 

So long as collaboration proceeds along predictable lines, models built from 
expressive communication may be unnecessary. But, if standard procedures break 
down, then collaborators who have developed more complex models of each other 
will be able to react more effectively and efficiently as a team. Of course, there is 
also a danger here. As perhaps hinted at by Azechi [2], stories might also reveal 
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characteristics of the storyteller that other collaborators might find quite negative 
while purely instrumental communications are unlikely to do so. 

A challenge for knowledge socialization is to determine the conditions under 
which it is better to keep communications “dry” or “instrumental” and when it is 
desirable to include more expressive or “wet” modes of communication. If the 
latter is necessary, we also need to develop methods of progressive disclosure that 
minimize friction and maximize empathy. 



4.2 New Technological Possibilities 

Recent advances in computing power, interface technologies, bandwidth, storage, 
and social engineering provide many possibilities for novel solutions to large-scale 
collaboration may be designed, tested, and improved. In the "real world" effective 
on-line collaboration systems both at a distance [16] and face-to-face [17], are al- 
ready being facilitated by technology. We believe further advances can be made 
by incorporating creativity aids, suggestions for processes [33], and by providing 
tools for alternative representations [31]. 

Failure to innovate is not random, but can be ascribed to one of several main 
difficulties: 1. Individuals or groups do not engage in effective and efficient proc- 
esses of innovative design. 2. The necessary skills, talents, and knowledge sources 
are not brought to bear on the problem. 3. Appropriate representations of the 
situation are not used. Laboratory [6, 15, 29] as well as field research [24, 36] has 
established that the major process difficulties are mainly due to a limited number 
of preventable errors. 

An appropriate overall structure may facilitate groups through steps of innova- 
tion and help guide these separate steps; distinct guidelines are appropriate within 
each of these steps [28, 33]. A common problem is that people typically fail to 
spend sufficient time in the early stages of design; viz., problem finding and 
problem formulation [27] . A common failure during a specific stage of innovative 
design is that people often bring critical judgment into play too early in the idea 
generation phase of problem solving. As another example, unlike Newell and 
Simon's [22] normative model of ideal problem solving, in fact, people's behavior 
is path-dependent and they are often unwilling to take what appears to be a step 
that undoes a previous action even if that step is actually necessary for a solution 
[29]. 

Regarding the second issue (bringing to bear necessary skills, talents and 
knowledge sources), while software tools cannot fully substitute for human ex- 
perts, evidence suggests that individuals have a large amount of relevant implicit 
knowledge which they often will not bring to bear on a problem and that giving 
appropriate strategies [29], or knowledge sources [30] can help. 

Regarding the third issue of appropriate representation, controlled laboratory 
experiments have shown that subjects did significantly better, for example, in a 
temporal design task when they used a spatial representation; yet, very few sub- 
jects spontaneously adopted such a representation [6]. The impact of good repre- 
sentations, however, is not confined to laboratory demonstrations. Speech research 
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advancements accelerated greatly when waveforms were largely replaced with 
speech spectrograms and Feynman diagrams allowed breakthroughs in atomic 
physics. By providing people with a variety of potential representations and some 
processes to encourage the exploration of various alternatives, we could probably 
improve performance significantly. 

Advances in speech recognition, combined with natural language processing 
and data mining raise the possibility of large-scale real time collaborations. 
Speech recognition can turn raw speech into text. Statistical techniques can auto- 
mate the formation of "affinity groups” that share various interests, values, or 
goals [23]. Speech recognition, in this context, need not produce perfect tran- 
scripts of what is said but only transcribe enough content to enable natural lan- 
guage processing software to cluster segments of text. 

Additional benefits stem from a speech to text to clustering system. In the past, 
conversations were transient. There was no "objective” evidence of their content 
or structure. It often happens, e.g., in a group meeting that the first person to raise 
a new idea is not recognized as having done so. Instead, the second or third person 
to mention the idea if often credited with it, quite possibly because the first men- 
tion is unassimilable by the current mental model of the listeners but causes a 
change in mental models so that a subsequent mention is comprehensible. The 
more general point is that computerized records of group meetings and larger scale 
collaborations allow the possibility of feeding back to the participants various 
visualizations of behavior, making the computer an active participant in group 
communication [32]. In conjunction with effectiveness metrics, such feedback 
mechanisms may allow groups to improve effectiveness. 

At IBM, we recently engaged in a corporate-wide experiment called "World- 
Jam" wherein all IBMers worldwide were invited to a three-day electronic meet- 
ing to discuss ten issues of interest to IBMers including employee retention, work- 
life balance, and working remotely. Over 52, 600 employees participated and 
posted over 6000 suggestions and comments. 

Each topic had a moderator and facilitators. Each moderator, in turn, had been 
asked to assemble a topic-knowledgeable "Board of Advisors" to provide refer- 
ences, websites, and other relevant materials ahead of time as well as participation 
during the on-line conference. In addition, the set of moderators and facilitators 
communicated with each other through a system called "Babble" which was de- 
signed, developed, and deployed at IBM Research. The Babble system blends 
synchronous and asynchronous text communication. Individuals in the system are 
represented as colored dots. The position of a dot within a simple visualization 
called a "social proxy" allows each participant to quickly see who else is present 
and which topics are being discussed. When a user of the system types an entry or 
scrolls through recorded discussion, their dot moves to the center of the social 
proxy for that topic. Several "Babbles" are now active within IBM including one 
for "Community Builders"; that is, people in various organizations throughout 
IBM interested in the process, tools, and methods for community building; "KM 
Blue" which includes a similar cross-organizational group interested in knowledge 
management and "Designers" which brings together people whose primary profes- 
sional identification is as a designer. In the case or WorldJam, Babble enabled the 
moderators and facilitators to trade best practices and engage in joint problem 
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solving in a timely manner. Additional information about the features, functions, 
design rationale for and empirical studies of Babble is available in [13, 14]. 

In earlier work, we showed that the introduction of problem solving aids to 
break set increased performance and creativity [30] and that instructions to take on 
multiple viewpoints increased problems found in heuristic evaluation of a software 
design [11]. The use of multiple viewpoints has been quite consciously used by 
the Iroquois (and other cultures) for thousands of years [36]. Other writers on 
creativity have suggested similar methods [9, 28]. 



4.3 Work of the Knowledge Socialization Group 

The work of our own group obviously relates to a tiny area of the vast space out- 
lined above. Our work comprises several interlaced threads. In one thread, we are 
conceptualizing, designing, and building tools to support the creation, capture, or- 
ganization, understanding, and utilization of stories as a method for groups to 
build and share knowledge. In the "Value Miner", e.g., natural language process- 
ing methods are used to find values as expressed in text. This could be applied to 
conversations, documents, and web-sites as well as stories. The Value Miner finds 
value-related words and phrases and tries to categorize these. A related, "Point Of 
View" tool shows the value similarities and differences of participants. We are 
also working on story visualizations aimed at helping individuals and groups cre- 
ate, understand, and find stories relevant to a situation at hand. For example, in 
one line of development, we are showing timelines of plot points and character 
development. In another line of representation research, we show a top level view 
of the kinds of attributes that are used to describe characters. By clicking on a top 
level view, the user may zoom onto the value associated with that attribute and ul- 
timately to the underlying text. In addition to visualizations, there are guidelines 
and measures based on known heuristics of story writing that can be incorporated 
into groupware [18, 21]. 

In order to provide a common underpinning for the various story related tools 
that we have developed, we have proposed a first pass at a "StoryML"; that is, a 
markup language specifically geared toward stories. In this representation, there 
are three different but related "views" of story: Story Form (what is in the story); 
Story Function (what are the purposes of the story); and Story Trace (what is the 
history of the story). In turn, the Story Form can be broken down into dimensions 
of Environment, Character, Plot, and Narrative. The idea of the StoryML is that it 
is expandable according to purpose. For some purposes, the user (e.g., a student 
studying mystery plots) may be satisfied with minimal detail concerning Function 
and Trace but need to expand certain aspects of the Story Form in great detail. In 
another context, a different user (e.g., a historian comparing certain themes across 
time and cultures) might have a very high level view of Story Form and Story 
Function but want to provide a detailed description of Story Trace. At this point, 
the meta-data in StoryML must be supplied by a knowledgeable human being. 

Once a base of potentially useful stories becomes large in any one collection or 
domain, it can become a challenge to find the "right" story or stories. If one is 
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looking for stories with particular objects, people, or places in them, "keyword in 
context" searches are generally sufficient. But, if one is looking for stories about 
activities, a more subtle approach is required. In response to this challenge, we 
have developed a script-based story browser. The "script" is a default set of pa- 
rameters about an activity; it may specify roles, goals, objects, and a sequence of 
events. In the story browser, a user may choose an activity and find stories related 
to that activity or related activities through a combination of searching and 
browsing. Although this activity-based search works at a higher level of semantics 
than typical searches, in many cases, a person is searching for a story that illus- 
trates a particular kind of very abstract point and even the particular activity is not 
that important. For instance, the story of Odysseus hiding his warriors in The 
Trojan Horse may be applicable in a wide variety of domains such as disease con- 
trol or computer security. In such cases, to find stories that are potentially applica- 
ble, we really need a system based on abstract planning and problem solving 
strategies. In our lab, Andrew Gordon [20] has developed such an ontology for ab- 
stract planning and problem solving by interviewing experts and reading strategy 
books in a wide variety of domains and then formulating these strategies in ab- 
stract terms. In the next step, these terms can be used to categorize stories accord- 
ing to the strategies that are utilized. This will enable individual problem solvers, 
educators, and teams to find stories that are potentially applicable to improving 
specific situations or solving particular problems. 

We are also engaged in attempting to extend the architect Christopher Alexan- 
der’s [1] concept of a Pattern Language to stories. A Pattern Language consists of 
a lattice of interrelated patterns. Each pattern has a Title, a description of a context 
in which a problem is likely to occur, a description of opposing forces, and the ba- 
sic outline of a solution. A pattern also often contains a diagram illustrating the 
basic solution, and may contain references or other evidence about its efficacy. 
Each pattern also includes links to higher level and lower level patterns. The no- 
tions of patterns and A Pattern Language have been applied to a variety of fields 
besides architecture including object-oriented programming [19], project structure 
[8] and human-computer interaction [5]. Typically, a Pattern Language is devel- 
oped by a community of practice as a way to create, organize and reuse knowl- 
edge. 

Our attempts to provide additional knowledge sources are focused mainly on 
teaching stories [34], particularly during specific stages of problem solving. Eor 
example, the story "Who Speaks for Wolf" by Paula Underwood [36] is a story 
especially well-suited to either problem formulation or to a last minute check that 
all stakeholders’ concerns are covered before significant resources are committed 
to a particular plan. In other cases, the individual, team, or organization will need 
to use a story browser whose expanding capabilities are outlines above. 

In this paper, we have attempted to do three things. 1. Convince the reader that 
improving and understanding the ability of individuals, teams, and organizations 
to innovate more effectively is key to our collective survival. 2. Outline how re- 
cent advances in science and technology offer a promise to enhance collaborative 
innovation. 3. Describe in outline the small contributions along these lines of the 
IBM Research Knowledge Socialization Group. 
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5.1 Introduction 

In today’s information technology (IT) and communication intensive environment 
people, technology and huild environment designers, and organizations are chal- 
lenged to understand the impacts on the workspace, content that is created and 
shared, and social, behavioral and cognitive aspects of work, play, learning, and 
community. The study is at the intersection of the design of physical spaces, i.e., 
bricks, rich electronic content such as video, audio, sketching, CAD, i.e., bits, and 
new ways people behave in communicative events using affordances of IT aug- 
mented spaces and content, i.e., interaction. The study proposes two hypotheses. 

Brick & Bits & Interaction Hypothesis: If we understanding the relationship 
between bricks, bits, and interaction we will be able to 

1. design spaces that better afford communicative events, 

2. develop collaboration technologies based on natural idioms that best support the 
activities people perform, 

3. engage people in rich communicative experiences that enable them to immerse 
in their activity and forget about the technology that mediates the interaction. 
Change Hypothesis: Any new information and collaboration technology will 

require change and rethinking of: 

1. the design and location of spaces in which people work, learn, and play. 

2. the content people create in terms of representation, media, interrelation among 
the different media, the content’s evolution over time so that it provides context 
and sets it in a social communicative perspective. 

3. the interactions among people in terms of the individual’s behavior, interaction 
dynamics, new communication protocols, collaboration processes; relation be- 
tween people and affordances of the space; and interactivity with the content. 
The paper uses scenarios and two collaboration technology examples to discuss 

the Bricks & Bits & Interaction perspective and highlights the behavioral and so- 
cial changes that have to be acquired as people interact with and in the context of 
new communication technologies and IT augmented spaces. The two information 
and collaboration technologies are: 

1. MS Netmeeting, a collaboration technology for videoconferencing [1], 

2. RECALL™ a research prototype developed at the PEL Lab at Stanford [2]. 
The two scenarios took place in the context of the education testbed focused on 

Global Teamwork in Architecture, Engineering, Construction (A/E/C) offered at 
Stanford University [3]. The A/E/C program engages students from universities 
worldwide, i.e., Stanford University, UC Berkeley, Georgia Tech, Kansas Univer- 
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sity, Cal Poly San Luis Obispo, from the US, TU Delft from Netherlands, Bauhaus 
University from Germany, ETH Zurich and FHA from Switzerland, University of 
Ljubljana from Slovenia, Aoyama Gakuin Univeristy, Japan, in global teamwork. 



5.2 Visibility, Awareness, and Interaction in Videoconference 
Space 

Scenario. Synchronous multi-modal collaboration in a videoconference mediated 
team meeting between an architecture student at Berkeley and two students at 
Stanford, a structural engineer and an undergraduate apprentice, is used as a sce- 
nario to discuss the method and findings of the study. The study captured the in- 
teraction among the three actors by video taping both sites. About 40 hours of in- 
teractions were recorded and analyzed using video protocol analysis methods. 
Two key aspects were studied: the workspace and content aspects present in the 
process, and the interaction related to the social process and the discourse. 

Bricks. From the point of view of bricks the study analyzes the affordances and 
limitations of typical preset physical videoconference workspaces, e.g., labs, or 
cubicles. More than that the location of the PC and audio/video devices is fixed. 
In such a videoconference setting one or more participants move, interact, and use 
the affordance of the technology and the space to communicate with remote team 
members. The research and pragmatic question is how can a flexible space be de- 
signed to accommodate the changing needs of the interaction, awareness and visi- 
bility of the distributed people engaged in the communicative event? 

The analysis focused on the environmental aspects present in the interaction, 
e.g., the analysis of the participants’ movements in the space. From the observa- 
tions, one important aspect in understanding part of the behavioral patterns of the 
participants was the study of the workspace used in the interactions. It was re- 
duced to the area surrounding the PC, creating a restricted interaction space. The 
affordances of the equipments used also determined the way in which the partici- 
pants used the space. Particularly, two aspects were relevant in the way in which 
people use the workspace: the locations of the monitor and the video camera. 

Both Monitors and Video cameras define preferred locations for participants 
that narrow the possibilities for using a certain area in the working space. When 
analyzing the movements of the participants in relation to the location of the 
equipment, we can see that the movements are restricted to a triangular area, that 
we have called Cone of Interaction (COI). The Fig. 1 shows the movements of 
two participants during a real interaction. We identified four major areas in the 
COI: Command area (A) the area in which the person that leads the interaction is 
located. The position has to do most likely with the use of the input device, and it 
is all the way around in the case of left-handed users; Secondary area (B) is the 
area occupied by default by the other person or people involved in the interaction; 
Pointing devise (p), Microphone area (m). 

Key aspects have to be considered in relation to the COL On one hand, the 
overlapping of functional areas created by the video camera and the monitor. This 
overlapping creates three zones as shown in Fig. 1: the sector (1) defines the area 
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in which the user of the computer can have visibility of the screen and be captured 
by the lens of the video camera; the sector (3) is the area in which no visibility can 
happen, both for the user of the computer and for the receiver of the image cap- 
tured by the video camera. However, the sector (2) is potentially most problematic 
of all, because when being in this area the user of the system can have visibility of 
the computer screen, but at the same time be out of the camera range without no- 
ticing it, creating a visual contact failure in the communicational process. 



Fig. 1. Cone of Interaction and Areas of Visibility, Awareness, and Interaction 

The COI contributes to a false sense of awareness of the participant in the video 
interaction, by creating the wrong belief that by being in the visibility range of the 
screen, the actions performed will be transmitted to the non-collocated participant. 
An example of this wrong sense of awareness is an action that happens often when 
participants are describing information that involves pointing with their finger at 
graphical information on the monitor, coined in the study “Faked pointing." 
“Faked pointing” can be considered a communication failure situation, because it 
can lead to misunderstanding and delays in the communicational process. The use 
of body gestures for conveying the discourse are drastically constrained by the af- 
fordances of the video devices in use, and the lack of awareness of this fact by the 
speaker can lead to important losses in the communicational process. 

Bits. During a videoconference meeting, participants manipulate, and edit rich 
content such as text, 3D models in CAD, and sketches on whiteboards through 
application sharing. The advantage offered by application sharing provides the 
participants with interactivity, visibility of ideas and actions in the application, 
making their thought process visible, as well as manipulation of each other’s con- 
tent and explore alternatives, i.e. “What if’ scenarios. 

Interaction. The videoconference technology requires the participants to ac- 
quire new communication skills and change their interaction habits to benefit from 
the multi-modal communication environment. New communication protocol 
emerge, for instance, since videoconference settings lack the collocated rich 
queues, e.g., participants spend longer time intervals at the start of the meeting to 
establish a framework and a rapport. The affordance or limitation of the designed 
workspace and hardware configuration can lead to miscommunications formalized 
as communication failures gaze, fake pointing, visibility of all participants and 
awareness of actions taken in the shared applications in the different situations of 
space usage by one, two, or more participants. 
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The video protocol analysis of the 40 hours of interaction captured lead to the 
identification of patterns of interaction based on the analysis of the verbal and 
non-verbal discourses. The smallest unit of communication for this level of analy- 
sis was the turn defined by each intervention produced by any of the speakers in 
the context of an interaction. The turns were grouped and structured into larger 
units, conforming three different levels inside the discourse’s structure: 

1. Topics: the topics correspond to those identifiable themes raised by the speak- 
ers during the conversation. Different turns can share the same topic. 

2. Episodes: episodes are series of turns that share some specific functional 
content in the context of the discourse. These turns in the episode can belong 
to different kinds of topics. 

3. Protocols: protocols point out the existence of patterns in the communication 
between the participants that happen in the inner structure of the Episodes. A 
protocol is shaped by a particular series of turns, which conform structures of 
verbal and/or behavioral actions that can be identified as having a particular 
purpose in the context of the interaction. 

In order to evaluate this inner structure of the episodes, two kinds of analysis 
were applied to the texts. The first one was a technique called linkography [4], 
which shows graphically the relationship among the different topics present in the 
discourse. Linkography is useful to identify characteristics of the verbal interac- 
tion as the areas enclosed by each Episode, the connections between the Topics, 
and the recurrence of them. Eig. 2 shows one of the linkography graphics pro- 
duced. It is possible to identify the recurrence of the topics, represented by several 
triangles; the bigger the triangle, the farther the appearance of a topic is from the 
last time it appeared in the interaction. This information was made more explicit 
by augmenting the linkography method by color-coding the triangles. Use of col- 
ors made it possible to identify important characteristics of the topics as the recur- 
rence of them. Those interactions that are more frequent - represented by small se- 
ries of pyramids - are the topics that constitute the core of the discussion. 

The different episodes contained in the interaction were represented in a bar 
graph in which the horizontal axis represents the temporal duration of the episodes 
by using proportional scale. This is important to establish ratios between the 
weights of the different episodes in the context of the whole conversation. This 
analysis enables the identification of the communication protocols present in the 
different episodes (Fig. 2). Examples of it are the strips pointed out by the arrows 
in the graphic, which represent the protocol for producing the transition between 
episodes. The study then linked the temporal graphic representation of the epi- 
sodes with the movement of the participants in the videoconference space (Fig. 3) 
to better understand the affordances and limitations of the bricks and bits during 
the communicative event and make preliminary recommendations as to how to 
change and improve the workspace, access to content, and interaction among the 
participants. In this example, the occurrence of the movements was analyzed in 
time, correlating it to the verbal interaction, and physical in relation to the com- 
puter’s location. By crossing the information about episodes and movements in 
space, the study showed that there is an increment in the physical movements of 
the participants once a second participant arrives. Nevertheless, the workspace 
and the hardware configuration do not adapt in a flexible manner to respond to the 
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change in the number of participants or their location in relation to the hardware. 
This can potentially lead to limited social interactions and cognitive experiences. 
The spatial and movement analysis indicated the effect of the location of the video 
camera in the use of space, i.e., the presence of the camera forces a diagonal dis- 
position of the participants during most of the interaction. 




(a) 



(b) 



Fig. 2. Meeting Discourse Analysis: (a) Linkographic Representation and (b) Analysis of 
Temporal Duration of Episodes 




Fig. 3. Movement & Interaction in the Videoconference Workspace 



5.3 Mobile Learners in E-learning Spaces 

This section turns the attention from the analysis of fixed settings for interaction in 
which archived knowledge, information, and product models are shared during a 
communicative event to facilitate teamwork and build common ground, to the 
needs of mobile learners to capture informal knowledge in diverse formal and in- 
formal e-leaming spaces. The specific collaboration technology the study focused 
on was RECALL™ [2]. RECALL™ is a learning and collaboration technology that 
facilitates transparent and cost effective capture, sharing, and re-use of knowledge. 
RECALL™ is a drawing Java application that captures knowledge in informal me- 
dia such as sketches, audio and video. 

Scenarios. Two scenarios are offered to discuss the use of RECALL technol- 
ogy and its relation to bricks & bits & interaction'. Interactive Lectures and 
Teamwork. 

The questions raised by the bricks & bits & interaction perspective are 

- How does RECALL™ impact the workspace and the place the interaction ac- 
tivity can take place? 

- How does rich content impact the level of retention in the interactive lecture 
scenario, and the quality of the communication in the teamwork scenario? 

- How does RECALL™ change the flow of communicative events? 




40 



R. Fruchter 



Bricks. Space design has to take into consideration that brainstorming for new 
ideas and team interaction do not necessarily have to take place in the office, 
classroom, or lab, in fact often they take place at the coffee house, airport gate- 
way, etc. The paper identifies the following work and learning spaces: 

e-Space (electronic space) - a formal and flexible PBL Lab that supports the 
diverse activities of mobile learners, such as lecture, presentations, teamwork, in- 
dividual work. An example of the mobile, wireless, flexible PBL Lab space that is 
augmented with RECALL™ was build at Stanford. The design of the PBL lab was 
grounded in cognitive and situative learning theory. The cognitive perspective 
characterizes learning in terms of growth of conceptual understanding and general 
strategies of thinking and understanding [5]. The design of the PBL Lab— to pro- 
vide team interaction with the professor, with industry mentors and team owners— 
provides a structure for modeling and coaching which scaffolds the learning proc- 
ess, both in the design and construction phases, as well as for techniques such as 
articulating and reflecting on cognitive processes. The situative perspective shifts 
the focus of analysis from individual behavior and cognition to larger systems that 
include individual agents interacting with each other and with other subsystems in 
the environment [6]. The PBL Lab is built as a flexible learning space that can be 
reconfigured by faculty or students on an as-needed basis to accommodate the dif- 
ferent learning and teaching activities (Fig. 4 a). 




(a) (b) (c) 

Fig. 4. Examples of e-Space, d-Space, and g-Space 

d-Space (distributed space) - an informal workspace that supports the mobile 
learner with wireless connectivity to the instructors, team members, mentors. Fig. 
4b illustrates an example of the PBL Lab wireless coffeehouse d-Space at Stanford 
as a social work, and learning space where learners get together and use their mo- 
bile laptops augmented with RECALL™, videoconference, and other standard ap- 
plications used in projects. 

g-Space (global space) - a formal and flexible PBL Lab that supports large 
group interactions in both collocated and global geographically distributed video- 
conference and RECALL™ connectivity (Fig. 4c). 

In such a broad space, i.e., e-Space, d-Space, and g-Space, that provides smooth 
transitions between formal and informal settings, learning and work occurs any- 
where. Consequently, content, knowledge, and people walk with the individual 
like a virtual knowledge bubble (k-bubble). 

Bits. The RECALL™ application encodes and synchronizes audio/video and 
sketch. Production and replay uses a client-server architecture. Once a session is 
complete, the drawing and video/audio information is automatically indexed and 
published on a web server that allows distributed and synchronized playback of 
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the session and from anywhere at anytime. The user is able to navigate through 
the session by selecting individual drawing elements as an index and jump to the 
part of interest. The RECALL™ technology invention is currently being patented. 

This rich and informal content, i.e., sketch, audio, and video enables the par- 
ticipants to communicate the rationale and context in which their concepts, pro- 
posed changes, or questions came up. The interactivity with the content enables 
users to access the content part of interest and manage information overload. 

Interactions. The sketch is a natural mode for designers, instructors, or learn- 
ers to communicate in highly informal activities such as brainstorming sessions, 
lectures, or Q&A sessions. Often a sketch itself is merely the vehicle that spawns 
discussions about a particular design issue. Thus, from a design knowledge cap- 
ture perspective; capturing both the sketch itself and the discussion that provides 
the context behind the sketch are important. It is interesting to note that today’s 
state-of-practice neither is captured and knowledge is lost when the whiteboards is 
erased. RECALL™ act as an exploration environment that captures both an indi- 
vidual memory of ideas and rationale i-memo, and team memory t-memo. 

RECALL™ offers some key benefits for producers and consumers of rich con- 
tent, such as, zero overhead cost for indexing and publishing on the Web rich 
content in the form of sketches, audio and video, as well as real-time interactivity. 
In terms of interaction among team members RECALL™ enables a faster turnover 
of information and team feedback; instructors can have an insight into learner’s 
thought process beyond the exercise result/answer or question; similar benefits can 
be observed in play mode or in customer relation management. Since the knowl- 
edge is in context, participants can make informed decisions. 



5.4 Emerging Changes Influenced by Bricks & Bits & 

Interaction 

Both studies offer insights in terms socio-technical-environmental changes that 
need to be considered from all three aspects, environmental - bricks, technical - 
bits, and social - interaction. All three aspects constantly influence each other. 

The influence of bricks on bits indicates that the workspace configuration can 
enhance or limit the visibility of participants in a multi-modal videoconference 
and the awareness of shared content displayed on a monitor or screen. Conse- 
quently, better software and hardware that supports zooming of the COI would 
improve the communicative event that has to take place in a fixed and confined 
workspace. A simple solution that can help improve the visibility and awareness 
in a video conference setting would be to change the location of the video camera 
so that there is an overlap of the areas (1) and (2) shown in Fig. 1. 

The influence of bits on bricks leads to changes such as development of flexible 
structural elements and mobile devices in the workspace that adapt and adjust to 
address the needs for visualization, composition and manipulation of rich content, 
or embedded multi-media devices in walls and furniture. 

The influence of bricks on interaction requires participants to change their be- 
havior, acquire new habits as they move and interact in the workspace, as well as 
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share the workspace to allow awareness and visibility in different scenarios, i.e., 
individual presence, small or large collocated teams linked to global partners. 

The influence of interaction on bricks leads us to rethinking the design of 
spaces as adjustable workspaces, e.g., mobile partition walls, flexible furniture, 
network and power infrastructure that allows connectivity anywhere anytime, to 
address individual and team work. In addition, bricks in the form of formal and 
informal work, learning, play, and community spaces have to facilitate smooth 
transitions among e-Spaces, d-Spaces, and g-Spaces. 

The influence of interaction on bits directs our thinking towards the design and 
development of new software and hardware tools that can for instance resolve 
visibility communication problems such as “fake pointing” i.e., pointing at infor- 
mation on the screen with hand gestures, “the gaze” i.e., providing eye-contact of 
remote participants in a videoconference, no matter where they look. 

The influence of bits on interaction requires individuals’ behavior and team 
dynamics change, as new protocols are formalized and adopted by the participants 
to best take advantage of emerging collaboration technologies. Social intelligence 
evolves as participants learn how to share and interactively manipulate rich con- 
tent, i.e., bits. This process enables a globally distributed or collocated project 
team to build a “common ground” or shared understanding of the goals, con- 
straints, and solution alternatives. The availability of context in which content 
was created opens new dimensions in the understanding of design decisions. 
More than that shared rich content impacts the level of retention, attention, and the 
quality of the communication. Finally, in building new social intelligence indi- 
viduals learn to share more information in a timely fashion at a faster rate, as well 
as become responsive to requests for information. This process leads to faster de- 
sign-build decision iterations and shorter time-to-market solutions in an industry 
environment, and supposedly a more intense social interaction in any community. 
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6.1 Introduction 

A communicational system using large networks and involving many users can be 
seen in two ways. The first is point of view of the exchange of knowledge and of 
shared knowledge between the users, in a cognitive way. The second is the point 
of view of the control in and of the system, in a social way. In fact, using net- 
works, users have to communicate and use large kind of knowledge: the exchange 
of information is always an exchange of knowledge. 

With this practice, users make up a new dynamic social space where problems 
of culture, of power and of social transformations spring up. And the question of 
the control is inherent in such fields. So, we can have a very deep or a soft control 
but always we have control expressed by the society itself about the goals of peo- 
ple using networks. Communication is, upper the technical aspect, a social act in- 
volved a possible transformation of the social structures. The expression of the 
control of users in communicational networks is a natural tendency in our socie- 
ties, for maintain cohesion and avoid breaking. Exchange of information between 
users is exchange of knowledge and implies the development and the modification 
of the users’ groups. This structure, these organizational modifications must be 
known for some social and politic structures putting in place the networks and 
theirs facilities. But these organizational modifications can also be known by users 
themselves and then tackle a new social space. In this case, the exchanged knowl- 
edge is automatically augmented with its interpretation and its social implications. 

We present the architecture of such a system allowing the representation of the 
meaning of communications between users. The architecture strongly uses the 
multi-agent paradigm. 
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6.2 System General Architecture 

As we generally believe the world to be consistent, we generally expect the same 
from our representation of it. While the fact we know this isn’t true, we usually 
theoretically consider our abilities to perceive it to be reliable and consistent, and 
we downplay the possible mistakes we can commit when doing so. For example, 
classical Communication and Information Systems usually suppose people using it 
say ” the truth”, that is that they know what they’re saying for sure and don’t lie [7]. 
Communication Systems often don’t deal all too well with contradictory knowl- 
edge sources, because of the lack of correct information or through malevolence. 
Another example is the inputs from robot sensors: these sensors aren’t perfect and 
so neither are the data they transmit. These can therefore be contradictory. And we 
must therefore deal with these contradictions. 

Another important characteristic of knowledge in communicational situations is 
its inherently dynamic nature. When we consider a system that has to help people 
in their decision making process in a real time framework, what is right at one 
moment might prove incorrect minutes later [2]. How then can a system cope with 
such a fluctuating knowledge and in which way it can express the nature and form 
of the control? It obviously has to keep in mind many possible scenarios, in other 
words, it has to conceive many possible future worlds in order to match them to 
recorded plans so that it can efficiently help in the decision making process. How- 
ever, once the system has chosen some current world representation, it has to re- 
tain the other possible representations so as to be able to alter its current state in 
case the actual situation shifts. 

We can focus on the six levels of model of Communication and Information 
System (CIS) which are the organization levels for complex systems [5]: 

1. Physical world, objective entities, 

2. Space of development of the entities, 

3. Movement, organizations, planning 

4. Communication of information 

5. Values, symbols, meaning of the phenomenon, intentions, 

6. Rules of the social game, power relations, emergence of the global meaning 
of the phenomenon 

The three first levels belong to the field of the classical Information System, the 
fourth allows the dynamic organization of the three previous. The levels five and 
six belong to the social, psychological and cultural field. They can not be repre- 
sented by a-priori defined structures using fixed primal components: the impor- 
tance and kind of psychological and social categories they represent depend on the 
current situation itself. They can not be decomposed into fixed components, for 
the same reason. Like this, these levels belong to a very complex domain. We are 
interested in these last levels to take into account intentions, opinions and judg- 
ments in the communication process, in order to define the good knowledge deliv- 
ered to the actors. 
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6.3 Representation of the Semantic of the Communication Act 

The approach consists in the knowledge of the situation of communication, of the 
real and objective facts and also of the mental representations of the situation by 
actors themselves. So, one includes the factual information and the elaboration of 
the process of decision, the opinions and judgments of the different actors about 
the different situations and about themselves. In this approach, the intentionality in 
the act of information exchange takes precedence over the transmission of neutral 
information, as in the classical Information Systems. 

We use a notion of agent as a software entity [8]. This notion puts agent’s no- 
tion like an action entity defined at the construction step of a software system and 
operating in the setting of an open problem to solve. A multi-agent system (MAS) 
is constituted of a set of agent organizations and is situated in an environment 
composed of many objects that are not agents that are essentially reactive in a 
permanent way. This system communicates with its environment by the action of 
specific agents so-called interfacing agents. The agents of the MAS use objects of 
their world as well as actions of the other agents to achieve some various actions. 
They unite their actions to define some collective behaviors. The efficient, visible 
behavior of MAS will essentially be achieved by the behavior of the agents and 
will be constructed therefore of distributed manner. This is in the agents, and es- 
sentially in the agents that will be distributed the characters of action, the effects 
the system in whole produces on the environment [1]. 

We saw in the definition of CIS that the three first levels describe the objective 
situation. These levels are processed by the communicating information level (so 
named Level 4). We make the hypothesis that some agents can also represent the 
levels 5 and 6. These levels constitute a specific domain, expressing evaluated 
knowledge, subjective, social and cultural aspects about the situation in progress. 
They are above the four previous ones and alter their structure. This is the first hy- 
pothesis of self-reference. They can not be represented, in the system, by func- 
tional and static pre-defined categories: each character in these levels, is mainly an 
act of communication. It means, that each communication is wrapped by a lot of 
agents representing the categories of meaning of the evaluated communication. 
This set of entities qualifies the communication and modifies physically a part of 
the structure of the system itself: they are effective software actions. 

So we express categories in levels 5 and 6, at the ontological level, with acts of 
communication [3]. The characterization of the situation according to the different 
actors is represented by the variability of situations, opinions, judgments, points of 
view. The representation of this characterization in the system will be a structural 
modification in space and time, wrapping every communication. The main hy- 
pothesis is that plastic model and plastic software structures are well adapted to 
represent a very evolving phenomenon. 
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6.4 Semantic Traits and Agents 

The only model, which allows such a plastic representation, uses the Multi- Agent 
Systems; we represent the different characters of the communication by a lot of 
software agents (c.f. Fig. 1). The sentences exchanged between users are com- 
posed of specific words coming from the different ontologies of the discourse do- 
main. Each word or set of words in each message are located in one or more on- 
tologies [6]. We call such a word, or group of words, a semantic trait. It expresses 
a character of the current situation. 




Fig. 1. Software agents wrapping the communicational system. 



For each semantic trait, we associate a lot of software agents, the so-called as- 
pectual agents. An aspectual agent is a weak agent reifying a semantic trait. For 
each semantic trait, we can associate several aspectual agents, specifying the se- 
mantic trait, its contrary, its opposite, the derived traits . . . So, we obtain, for each 
semantic trait, a lot of aspectual agents that must correspond. For all the semantic 
traits expressing the whole of ontology of the domain, we have a large set of as- 
pectual agents, that are nor independents. The agents are linked by their acquain- 
tances, they can communicate, they can awake or kill others agents, co-operate 
and form groups expressing complex associations of semantic traits [3]. 



6.5 Aspectual Agent Organization 

Then, for each sentence exchanged between user, we have a lot of semantic traits 
expressed in a set of activated aspectual agents, the agent that match on the differ- 
ent semantic traits, a group for the sender and another for the recipients. We aug- 
ment these semantic traits with some subjective aspects about the perception of the 
situation the users can have, like judgments or feelings like fear, dread, satisfac- 
tion, lie ... And we reify these subjective semantic traits with others aspectual 
agents. These agents awake others in observing aspectual agents and so make 
emergence of the semantic traits they match (c.f. Fig. 2). Fike this, we can express 
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by agents the six levels of the CIS, including judgments and feeling expressed by 
users. 

The aspectual agents awake or kill others agents, struggle with someone, co- 
operate with others and form that we call an agent landscape, a very dynamic 
agent organization expressing with augmentation the semantic of each communi- 
cated sentence. More than, the aspectual agents take into account the organiza- 
tional state of the current aspectual organization of each current user receiving a 
new message. They "set in situation" the current message, taking account of the 
previous: they constitute an organizational memory. 




Fig. 2. The aspectual organization operating on the semantic traits 

The behavior of these agents, their internal transformation and their communi- 
cation realize spatial and temporal organization of level 5 and 6. The global char- 
acters, which can be found in the multi-agent system, are emerging characters. 
Thus, those agents with their own particular behavior may disturb the organization 
of the system and make it self-reorganize to exhibit new emerging characters. 

In MAS, expected or unexpected structures may appear. We make the hypothe- 
sis that emerging structures express the meaning of the communications between 
users describing them only in a geometrical way we call a morphology. This 
emerging structure represents the accurate views about the different perceptions of 
the phenomenon elaborated during communication. Because the system is dy- 
namic, the whole emerging structures change according to the evolution of the us- 
ers’ perceived phenomenon. So the agent structure and its evolution reflects the 
organization and the evolution of the perceived phenomenon itself. 
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This aspectual organization will grasp the communicational data in order to ex- 
tract their characteristics. The aspectual agents represent, by their actions, their 
behavior and inner states, the emergence of semantic traits in account with the 
proximity with the others previously expressed semantic traits. 



6.6 The Emerging Meaning of the Communication: 

The Morphological Agent Organization 

The previously defined aspectual agents allow the expression of the meaning of 
each semantic trait of the communication in an act of communication. The set of 
the all MAS wrapped to each concrete actor allows the expression of the whole 
meaning of the communicational situation. This meaning is generated by emerg- 
ing structures, expressing the morphology of the set of MAS. For this, we have de- 
fined the notion of form of the agent landscape [4], that is the transformation of 
the agent landscape in a geometrical way. This is an important point of our work, 
where we study the coherence and stability of MAS expressing global sense but 
using geometrical characters of the MAS. 

The goal is to build a structural and immediate connection between the set of 
actors’ ideas and the landscape of agents. This notion is central in the model and 
understood as a real new form of meaning, expressing with a lot of agents the 
synthesis of particular forms (the aspectual agents) around the different concrete 
actors. 

Given the very great number of aspectual agents, it isn’t possible to follow them 
individually. We therefore study them as a whole, distinguishing shapes and forms 
in the interactions. We appreciate a form in a geometrical way, using the specific 
organization of the morphological agents. We call this view of the aspectual 
agents organization, considered as a population, an agent landscape [3], [7]. An 
agent landscape is space expressing the active aspectual agents, considered as well 
understandable 

In the system an agent landscape is represented by specific projections of the 
studied agent organization according to height axes. Such a representation defines 
in fact a new space of dynamic description of any agent organization. The height 
space dimensions are the following: 

• organizational distance: the state of the agent compared with the state of 
the whole agent organization, 

• velocity: the speed with which an aspectual agent has developed so far, 

• facility: the ease with which an aspectual agent has developed so far, 

• supremacy: a measure of the ratio enemy allied of each aspectual agent, 

• complexification: a measurement of the evolution of the inner structure of 
the agent in us 

• intensity of the internal activity: the expression of the exchanges between 
the inner components of aspectual agent before action, 

• persistence: a measurement of the time of life of the agent, 

• dependency: the fact the agent is or is not free or dependent. 
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We express the characters (the dimensions) of this space using specific agents. 
The morphological agents are the expression of the aggregation of aspectual 
agents in the landscape made with these agents, according to those height criteria. 
The set of morphological agent’ s form a kind of dynamic space, each point in this 
space is in fact a morphological agent. Such an expression of a massive set of 
agent is the fundamental result allowing the development of the system. 



6.7 Interpretation of the Morphological Organization: 

The Evocation Agents 

The morphological agents provide the stabilized state of the aspectual organi- 
zation that corresponds to a fixed point of the mirroring process. The reading of 
the morphology, that is the representation of calculations done by aspectual agent 
aggregations, provides the emergence of the sense of that has been effectively cal- 
culated with the aspectual agent, while taking account of morphological agents of 
engagement. This notion of emergence has a strictly organizational character well. 

But we won't remain at the level of the simple expression of morphological 
agents in groups. The system must take account of the significance of this mor- 
phology, to fear it, would be that to memorize it in an organizational way, that is 
to take account implicitly in its future activation, in its future engagements. The 
system has that to be-to-say it functions like an organizational memory. 

And another organization of agents, after the aspectual agents and those of 
morphology be going to take in consideration the state of the landscape of mor- 
phological agents to achieve an analysis of its own morphology. It is about repre- 
senting the sense of the activation of the aspectual agent organization, from its 
characters of aspect expressed by morphological agents. An organization of 
agents, the agents of evocation, be going to provide a cognitive view of that that 
has been expressed by the geometric and semantic information coming from the 
landscape of morphological agents, above of the aspectual agent landscape. 

Agents of evocation, that have a classical structure, are going to represent cate- 
gories of significance between the action of the robot, the activity of its interfacing 
agents, the computational development of its behavior and the representation of 
this development by morphological agents. They express the global consistency of 
activation while doing choices and decisions of global behavior, while keeping 
strategies of inhibition of action for certain interfacing, aspectual or morphological 
agents and while controlling so the general line of organizational emergence 
achieved in the system. 

Let's notice that these strategic actions will be indirect, in relation to every 
agent's behavior, permitting to constitute a system with emergence of sense with 
its intrinsic characters of non-stability and learning by structural distortion only. 
The systemic loop is now closed. 
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6.8 Conclusion 

We have applied such a system for the management of crisis situation in industrial 
disasters and we have developed a prototype for simulation of communications 
between actors, coded in Distributed Smalltalk™ [7], The task at hand was trying 
to build an understanding of a dynamic, conflicting situation, perceived by the 
Evaluation System through exchanged messages potentially incoherent or con- 
flicting. To reach this difficult goal, we have proposed architecture for an Evalua- 
tion System based on the morphology of the behavior of aspectual agent organiza- 
tions. We can transpose and apply this model to the dialogs between every virtual 
user’s community. This is a research program where we have to express the on- 
tologies about the exchanged and shared knowledge used by users and adapt the 
Evaluation System for the case in the environment of each user, above his usual 
communicational interface. 
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7.1 Introduction 

The Internet has became a social place. It allows us to exchange our thoughts 
and opinions with other people who have similar interests or goals. Howe- 
ver, existing communication systems such as e-mail, BBS (Bulletin Board 
System), chat and instant messaging systems have limitations on eliciting 
and circulating opinions in a community^ because of communication costs 
that block talking various opinions between community members. We con- 
sider that social intelligence is a property of a community that enables the 
members to exchange and evolve their implicit knowledge. To augment social 
intelligence of a community, facilitating elicitation and circulation of hidden 
opinions of the members by reducing the communication costs are required. 

We have developed the Public Opinion Channel (POC) prototype system 
that reduces the communication costs. POC is a concept of an automatic 
community broadcasting system [7.1] [7.2]. POC elicits and circulates commu- 
nity members’ opinions by providing a story to the members. A story is a 
digest of opinions in the community. Although the members have their opi- 
nions, they often hesitate to say their opinions to others. By providing the 
story to the members, they can easily find implicit opinions including not 
only major but also minor opinions in their community, and are encouraged 
to say their opinions. The POC prototype system allows members to listen 
to the stories as radio program, viewing various opinions passively, and send 
their opinions as anonymous short messages. 

^ community here is a group of peoples who have the same interests and goals, 
and discussing and working together on the Internet. 
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Table 7.1. Comparison of costs for receiving, sending, selecting a message between 
an e-mail system and other communication systems. 





Receiving 


Sending 


Selection 


E-mail (baseline) 


— 


— 


— 


BBS 


High 


Medium 


High 


Network news 


High 


Medium 


High 


Chat system 


Low 


Low 


Low 


Instant messaging 


Low 


Low 


Low 



7.2 Communication Costs 

The communication costs referred to here are expenses of cognitive resources 
for receiving, sending, and selecting a message by using communication tools 
on the Internet. There are three kinds of the communication costs: (1) recei- 
ving cost, which is the cost of a user receiving and comprehending a message 
by using the communication system, (2) sending cost, which is the cost of a 
user preparing and sending a message, and (3) selection cost, which is the 
cost of a user selecting a message to read. 

Table 7.1 compares the communication costs between several communi- 
cation systems and an e-mail system, such as Eudora^ and Outlook^ that 
receives and sends only a plain text message. A message referred to here is a 
unit of information such as an article on BBS or from network news, one or 
several lines of texts for chat systems and instant messaging tools^. 

BBS and network news incur high costs for receiving and selecting a 
message. This is because a user has to keep track of messages in order to 
partake in discussions. When BBS and network news are updated, it becomes 
difficult to follow discussions. Furthermore, selecting messages from a large 
number of messages from BBS and network news is difficult. 

A chat system and an instant messaging tool require all costs to be low. 
This is because these systems treat short messages consisting of one or several 
lines of text. Thus, a user can receive and comprehend the contents of the 
message easily and instantly. In effect, they can send their thoughts just like 
talking by using these systems. 

From this comparison, a communication system should be designed to 
meet three requirements: (1) it should allow a user to attend discussions 
without requiring them to keep track of discussions, and (2) it must help a 
user to find or select a message they actually wants to read, and (3) it allows 
a user to send short message. 

In addition to these requirements, we added the following to the requi- 
rements in order to facilitate community members to acquire stories and 

^ http://www.eudora.com/ 

® http:/ /www. microsoft.com/ofhce/outlook/default.htm 
including Yahoo Messenger and AOL Instant Messenger 
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Fig. 7.1. An overview of POC prototype system. 



encouraging them to talk their opinions: (4) anonymous messaging, which 
allows community members to send their opinions without revealing their 
personal information such as their names, (5) passive viewing of opinions, 
which enables the members to view opinions without any operations, and (6) 
continuous broadcasting, in which a POC broadcasts stories at all times by 
generating new stories or rebroadcasting existing stories. 



7.3 POC Prototype System 

The POC prototype system consists of a POC server (community broadca- 
sting server) and several POC clients. Figure 7.1 shows an overview of the 
POC prototype system. A POC server is a broadcasting system that provides 
(1) opinions for supporting discussions between community members, and (2) 
stories for notifying picked out opinions to the members. A POC client is a 
tool for (1) listening to stories, which are provided as radio program by the 
POC server, and for (2) exchanging opinions between the members for dis- 
cussion. In this section, we describe the story broadcasting function of the 
POC server, and the discussion support function of the POC client. 

7.3.1 POC Server 

The POC server has two roles: (1) discussion server, which provides opinions 
to the POC clients for facilitating discussions between community members, 
and (2) broadcasting server, which generates and broadcasts stories as radio 
program. We describe the latter function in this subsection. 



54 



T. Fukuhara, T. Nishida, and S. Uemura 



Table 7.2. Example of a story. 



DJ 


Next opinion is “affordance” . 


Opinion 1 


Does anyone know about affordance? 


DJ 


Related to this opinion, there is another opinion. 


Opinion 2 


There is a workshop on designing intelligent artifacts. 
This is a good introduction to affordance. 


DJ 


This is the last opinion. 


Opinion 3 


I found a good page on cognitive psychology when I was 
searching affordance. 


DJ 


Thanks all. We’re waiting for your opinions. 



Generating stories. The POC server generates a story that has a context. 
Context here means the semantic relationship between sentences. The context 
is made by linking related opinions and is generated as follows. 

1. Pick out an opinion (source opinion) from an opinion database. 

2. Retrieve opinions from the opinion database by using title of the first 
opinion. 

3. Sort retrieval results by date order, and first n opinions are added to the 
source opinion, (n is threshold) 

An example of a story is shown in Table 7.2. In this example, a DJ who plays 
a role of a disc jockey in a radio program introduces three opinions related 
to “affordance” . These opinions are sorted by date. 

Broadcasting stories. The POC server broadcasts stories as radio pro- 
grams on the Internet. This is done by MP3 audio stream. The POC server 
generates audio files by using a text-to-speech system (TTS), and broadcasts 
them via MP3 streaming server. The POC server uses CHATR® for TTS, and 
icecast® for the MP3 streaming server. A user can listen to the stories via 
MP3 players such as WinAmp^. We regard MP3 players as the POC client 
for listening to the stories. 

7.3.2 POC Client: POCViewer 

In this subsection, we describe the discussion support function of the POC 
client. We have developed an implementation of POC client named POCVie- 
wer that supports exchanging opinions between community members. With 
the POCViewer, users can view opinions passively, and compose and send 
their opinions to the POC discussion server. Figure 7.2 shows an image of 
the POCViewer. POCViewer shows opinions in the Telop style, i.e., each cha- 
racter of a story appears one by one. The POCViewer has several functions 
for facilitating the discussions. 

® http:/ /results. atr.co.jp/products_e/frame9.html 
® http://www.icecast.org/ 

^ http://www.winamp.com/ 
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Fig. 7.2. A screen image of the POCViewer. 



Table 7.3. An example of a message. 

{ ?xml version=”1.0” encoding=” Shift.JIS” ?) 

( opinion name=”tem_imf’ date=” 2001/5/6 20:52:18” 
host =” 192.168.31.163” reference=” comment 5 .xml” ) 

( title) ATM service in Japan (/title) 

( comment) 

I think ATM services in Japan are inconvenient. 

Banks should run their ATMs for 24 hours. 

(/comment) 

( url) http://www.japanese-online. com/language/bank. html(/url) 
(/opinion) 



Opinion composer. A user can compose, edit, and send their opinion to 
the POC server. The user can save their opinion as a local opinion, which is 
stored in the local hard disc, and modify or browser it later. 

An example of an opinion is shown in Table 7.3. The opinion consists of 
a title, a comment, and a reference URL. When the user sends he opinion, 
she inputs title and comment from the POC client. The POC client inserts 
XML tags to the opinion, and sends it to the server. 

Local mode and network mode. A user can select the mode of the 
POCViewer as either local mode or network mode. In local mode, the user can 
compose and store their opinions into local hard disk. In network mode, the 
user can not only send their opinion but also view and capture opinions of 
their community. Local mode is suitable for composing and viewing personal 
opinions. By separating the local and network modes, the user can store their 
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Fig. 7.3. Continuous retrieval. Retrieval is made by extracting keywords from 
previous retrieval results, and retrieving them continuously. 



tentative opinions on their local hard disc, and send the mature opinions to 
the server. 

Capturing opinions. A user can capture opinions on the local hard disk. 
The user can view captured opinions in local mode. And they can also edit 
and modify the captured opinions, and send them to a POC server. 

Opinion retrieval. A user can retrieve opinions in network mode. The ac- 
tual retrieval process is run on the POC server. The POC server uses the 
n-gram search method which searches messages according to pieces of que- 
ries consisting of one or two characters [7. 3]. This method has the advantage 
that various texts that include queries partially are retrieved. Thus, the user 
can view various stories. 

Continuous retrieval. A user can view set of similar opinions continuously. 
The POeViewer can retrieve opinions continuously. When a user retrieves via 
a keyword, the POC Viewer gets another keyword from the retrieval results, 
and retrieves a set of opinions by using that keyword. Figure 7.3.2 shows an 
overview of the continuous retrieval. 

The user can view a set of opinions based on the retrieval results. In 
Figure 7.3.2, opinions related to a keyword “Agent” are retrieved. When 
continuous retrieval mode is off, further retrievals are not perform. When 
continuous retrieval mode is on, further retrievals based on previous retrieval 
are performed. The retrievals are performed by extracting a keyword from 
previous retrieval results. The keyword is picked out according to the feature 
value of a word. In the implementation, we use the frequency of word as 
the feature value. Retrievals continues according to previous retrieval results. 
The user can view another opinions originating from initial keyword given by 
the user. 
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7.4 Evaluation 

We performed two preliminary experiments of applying the POC prototype 
system to a practical community. One is a long term observation of opinions 
in a group[7.4], and the other is a short term observation in a group thinking 
situation. 

The first was on the evaluation of exchanging implicit opinions in a group. 
This experiment was made for three months. The group consists of eight mem- 
bers, all Japanese, and each member is familiar to the others. 1,329 opinions 
were collected during this experiment. The members exchanged their opini- 
ons about various including not only their business but also movies and TV 
programs. Some opinions are referring to other members’ opinions, and the 
others are monologues. Although the members post many opinions, we found 
a point that discussions did not last for a long time. We consider the reason 
is that the members had got used to the “couch potato” style of viewing of 
the opinions because the POCViewer shows the opinions automatically. To 
facilitate discussions in the POCViewer is our future work. 

The second was on the evaluation of creativity support by POC [7. 5]. Mi- 
ura argued that POC enabled group members to find an opinion to which they 
have not paid attention. In this experiment, members discussed on demands 
or requests from their university using the POC system. The POC server 
broadcasts opinions in order to provide various viewpoints for the members 
periodically. In this experiment, circulating opinions enabled members to 
recognize importance of previous opinions. We will continue evaluation of 
creativity support by POC. 



7.5 Discussion 

7.5.1 Automatic Broadcasting System 

Tanaka et al. proposed information visualization tools using a TV program 
metaphor [7. 6]. By using these visualization tools, the user can view Web 
documents or retrieval results from a database in passive viewing style like 
viewing a TV program. 

One of major differences between POC and the information visualization 
tools is the source of the story. We treat community members’ opinions as 
the source. This is different in story generation method from the visualization 
tools because identifying minor opinion from major ones is required. In the 
concept of POC, POC takes up not only major opinions but also minor ones. 
This requirement is inevitable for fair discussions in a community. Although 
we have not implemented this function yet, we consider it is important to 
find minor opinions for the automatic broadcasting system for a community. 
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7.5.2 POC and Narrative Intelligence 

Lawrence et al. proposed to use storytelling to exchange knowledge in a 
group [7. 7]. They argued that there is a function for collecting and sharing 
knowledge in storytelling. One of points of POC in regard to narrative intel- 
ligence is that opinions in the POCViewer become seeds of narratives. In the 
long term experiment, we found that several opinions becomes the seeds of 
narratives, i.e., community members replied to the opinions by adding their 
thoughts or memories related to them. However, what kind of opinions are 
suitable for seeds of narratives that cause further replies. To analyze this kind 
of opinions is future work. 



7.6 Conclusion 

We have developed a POC prototype system for eliciting and circulating 
opinions in a community. The system augments social intelligence by reducing 
the communication costs. From the experiments, we found availabilities of 
the POC prototype system on (1) eliciting and circulating various implicit 
opinions in a community, and (2) creativity support in a community. 
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8.1 Introduction 

Increasing concerns for environmental problems have contributed to the general 
awareness regarding the importance and difficulty of engaging a range of 
stakeholders in the decision-making process. Those who are involved in making 
decisions, and affected by the decisions made, from the authorities to the members 
of public, should be offered an opportunity to engage in informed deliberation, in 
which views from various perspectives are raised, examined, discussed and taken 
into consideration before an attempt to reach a consensus is made. 

Our ultimate goal is to make such a process of public deliberation, i.e., a “for- 
mal or informal process for communication and for raising and collectively con- 
sidering issues”[l], as effective and meaningful as possible by supporting the 
community of stakeholders with information tools [2]. This takes into account the 
increasing cases of initiatives from national and local authorities over the world to 
make environmental data and information electronically available and the ever- 
increasing popular access to the Internet, which offer a potential for new knowl- 
edge to emerge through interactions of people over the network [3]. In this con- 
text, we view “social intelligence” as the potential capability of a community to 
engage in informed deliberative decision-making process, and the “traces” left be- 
hind such a class of cooperative activity.' 

The importance of bringing together the wide spectrum of concerned groups 
and individuals in realizing a sustainable society cannot be understated. The ne- 
cessity of partnership between concerned members of the community, such as in- 
dividual residents, policy makers, industry and NPOs, and the establishment of 
communication channels for such a collaborative enterprise are repeatedly empha- 
sized [4]. Mere dissemination of information, however it may be designed to cater 
for the presumed interests of other parties, is not enough: there must be a place for 
a dialogue based on the information made available. 

Such a vision of partnership in social decision-making among various cohorts 
within a community is, however, not simple to achieve. In reality, there are num- 
ber of factors that would hamper and prevent its effective implementation. 
Among them is the difficulty for public to participate in discussions due to the 
presumed lack of confidence concerning technical issues that might arise. There 
could be a breakdown in communication due to jargons and technical terminology. 



* This is by no means meant to be the definition of “social intelligence”, but an example of 
circumstances under which it would manifest itself. 



T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 59-66, 2001. 
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At the same time, while information concerning public policy could be dis- 
seminated, it is not the case the other way around; it is difficult for authorities, to 
obtain feedback from members of the community, and when they do, that might 
carry a hostile tone. 

We believe that some aspects of the problems raised above can be tackled by 
providing support for members of the community to have their voices heard more 
effectively. In our ongoing research, we attempt to address this issue by enabling 
individuals in informed discussions through the provision of supporting tools 
aimed to enhance public discourse. Our current research effort is the development 
of a network-based discussion system that is aimed to support public deliberation 
on environmental issues [2]. In deliberation, it is assumed that there are both con- 
sensual and adversarial processes, since the participatory collective would often 
include those with opposing agenda and different values. It is not merely a discus- 
sion forum; it should be an iterative process in which the overall aim is either or 
both to reach a consensus (closure), and/or to increase participants’ understanding 
of the issues raised and different positions assumed by other participants. Thus, 
deliberative processes are often seen as essential in making informed, collective 
decisions. The intended use of the system is primarily for asynchronous discus- 
sions in limited domains for which simulation-based analytical tools are available. 

The feature that is required in terms of interaction design is the facility for the 
participants to contribute to the discussion without too much overhead and psy- 
chological barriers that hamper the representation of non-technical or novice 
views. In the past, such a technical divide has often led to the total breakdown of 
exchange of views between experts and non-experts in the environmental forum, 
resulting in typical standoff situations between the two antagonistic camps. In- 
stead, in order to promote public deliberation and sharing of the responsibility of 
collective decision-making by the community, voices should be heard and argu- 
ments should be understood. 



8.2 Enabling Individuals to Collect and Exchange Information 
and Opinions 

Recent interests in community-oriented (intelligent) information systems projects 
grouped under labels such as “community computing” and “communityware” 
highlight the focus on communities with the aim of supporting their formation and 
their activities. Despite the emphasis on the “community”, we observe that an es- 
sential element of these community-oriented systems is the enhancement of inter- 
active capabilities of individual members of a community. This is a natural con- 
sequence since the community activities are often decentralized and bottom-up, 
with the strength being the capability to generate emergent solutions to ill- 
structured problems for which individual participation is essential. 

We see the potential in the community-oriented infrastructure such as Nishida’s 
“Public Opinion Channel” (POC) [5] as a vehicle for increasing social awareness 
in terms of information exchange and perspective sharing. For such purposes, a 
conversational interface such as EgoChat [6] can be considered to be a natural 
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candidate for people to interact through a platform such as POC. In EgoChat, 
conversations between personal agents are generated and sustained through key- 
word matching in a conversational database (“conversation base”) that stores con- 
versational fragments (utterances) of the users these agents represent. A human 
user observes the conversation that unfolds between these agents and can interrupt 
the conversation and “talk” to the agents thereby increasing the data in the conver- 
sation base of his personal agent. The context of the utterance is decided by the 
topic they (i.e., the user and the agents) are “talking” about. 

The significance of such a system is that it uses the everyday form of informa- 
tion exchange in communities, viz. conversations, as the means to elicit informa- 
tion from the human participant. This takes advantage of the nature of human 
conversation suggested by Schank that humans do not necessarily create new 
knowledge through conversations but present what they have already thought 
about, reformulated in the form appropriate for the conversation [7]. 

Conversation bases held by each personal agent for members of a community 
together store a rich source of opinions and information held in that community. 
However, the conversational fragments stored as text loses the non-verbal infor- 
mation such as shared visual information and gestures that are present in ordinary 
face-to-face conversations. Interactions with such a conversation base should ide- 
ally be multi-modal, incorporating visual and audio data to accompany informa- 
tion broadcast. We are currently experimenting the use of wearable devices for in- 
formation gathering, with the possible effect of grounding information to enable 
individuals to share with the community what they saw and heard, which would 
provide a firmer context in which one’s opinions were raised. 

Figure 1 illustrates a conversational agent interface modeled after EgoChat, 
which stages conversations between users’ personal agents. In this demonstrator, 
an entry in the conversation base contains not only the text of an utterance but also 
the accompanying visual information at the time it was made, taken through a 
head-mount camera and stored in a wearable PC, and the gesture information that 
is captured through a motion capture device with sensors attached the speaker’s 
arms and interpreted by a simple gesture recognition system. Once uploaded to 
the conversation server, visual and gesture information is shown in accordance 
with verbal speech. Our initial experience with the system suggests that, if non- 
verbal information is seen as the augmentation of verbal information, i.e., as modi- 
fiers of keywords, such a multi-modal conversation base is useful when an utter- 
ance involves the use of indicatives such as “such” and “like this”, since such in- 
formation is sometimes not easy to elaborate in words. 

While the use of mobile devices may seem rather cumbersome at this point in 
time, we believe that image capture and transmission would be in the near future 
as ordinary as voice transmission. Information about the community, such as traf- 
fic situation and rising water levels in nearby streams can be collected backed up 
with images. Using community members and their everyday awareness about the 
environment offer the possibility of collecting environmental information biased 
towards concerns of the community members. 
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Fig. 1. Providing multi-modal information for the conversation base 



8.3 Raising Social Awareness through Position- Oriented 
Discussions 



When a group of concerned individuals share the overall goal of reaching some 
form of consensus on an issue through deliberation, discussions can be seen as co- 
operative work. Discussions, however, contain only a weak representation of its 
common field of work. In other words, the object of cooperation, in this case de- 
liberation and possibly consensus formation, is often poorly represented and not 
easy for the cooperative ensemble (i.e., the participants) to monitor its progress. 
Take for example, threaded discussions, which are one of the most common forms 
of electronic bulletin board systems (BBSs), and incorporated in some groupware 
applications as the issue-based information systems (IBIS) style of discussion 
threads. When the discussion is relatively small, a user can easily monitor what is 
going on by skimming through the contents of contributions. As it grows, unless 
the user is a very active participant of the discussion, it will not only be difficult 
for her to monitor the development of argumentation and sub-topics, but also to 
participate in it. In computer-mediated discussions, the visualization of argumen- 
tation addresses this issue, and systems such as Conklin’s gIBIS [9] have been 
proposed. In the area of scholarly discourse, collaborative argumentation [8] is 
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proposed offering hypertext-based solutions. However, neither of these ap- 
proaches is adequate in dealing with potential problems such as dominance of 
“loud” voices and effects of inactive but essential participants (“lurkers”), and dif- 
ficulty in assessing how one is understood by other participants. 



8.3.1 Positioning-Oriented Discussion Interface 

We address these problems through a form of interaction involving a graphical 
interface that encourages participants to directly manipulate their positions in the 
opinion space, thereby visualizing participants’ positions in a discussion. Figure 2 
shows an example of the opinion space: 

- It describes a two-dimensional space (the “board”) with horizontal and vertical 
axes representing two of the factors (issues) in the discussion. We believe that 
ordinary users would not be able to cope with more than two dimensions, espe- 
cially when it comes to positioning themselves in the opinion space. 

- Each participant is assigned an icon, which can be a piece with a designated 
colour, or her own image such as a photo or cartoon. 

- To make a contribution, the user “moves” her piece to the position in the board 
she thinks that describes her stance in the discussion, and types in the argument 
or justification for her move. Naturally, she does not have to actually 
“move” — she can remain in the same position and contribute her opinions. 

- In addition to the graphical interface, each contribution is listed in a table, along 
with information about the contributor, direction of move, and a time stamp. 

- Labels of each axis that define the opinion space are changed according to the 
development of the discussion. Once the labels are changed, the positions are 
reset to the neutral position (i.e., the origin). 

In this way, users can position themselves with respect to their perception of 
other participants’ positions, revealing how contributions are perceived and inter- 
preted among participants. In our preliminary experiment using this interface, 
among the comments we received after the session included the clarity of mutual 
positions concerning issues with respect to relative positions with other partici- 
pants, and the effect of interface for focusing on issues without diverging too 
much. Some of the effects of visualizing positions we identified were as follows: 

- By “playing back” the changes in the board, the participants were able to recall 
the flow of discussion and how it unfolded. 

- Participants seemed to have retained information as to how other participants 
changed their opinions. 

- By observing the change in opinion, especially at “crossing the axis”, we can 
analyse what made the participant change their views and its justifications. 

Moreover, it provides visual information as to how diverse existing opinions 
are and how the discussion has contributed to participants closing in (or growing 
farther apart) on issues being discussed. 
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Fig. 2. Discussion interface through positioning 



When seen as the explicit indication of one’s preferences, positioning in the 
opinion space can be seen as a form of informal voting, and the distribution of 
pieces as the tally of the vote. While taking a formal vote is often avoided for the 
fear of making premature decisions, taking informal votes is considered to be use- 
ful in discussions [10]. Therefore, we expect the advantages (and disadvantages) 
of taking informal votes during discussions to be inherited in this interaction. 

One of the important issues in the position-oriented discussion interface is the 
choice of labels for each axis. The selection of these labels entails the generation 
of opinion space, and this in itself is often the point of controversy. To address 
this problem, a hierarchical issue structure may be created, as an initial road map 
for the discussion, from which labels for the axes can be selected based on an is- 
sue and one of its sub-issues, recording the outcomes as the discussion proceeds. 

The discussion system itself is designed to include features such as topic ex- 
traction, participant clustering, participation induction, and links to community in- 
formation. We believe these features would enhance the accessibility to discus- 
sions when they grow large. In contrast to the IBIS-family of discussion systems, 
this interface guarantees the simple snapshot of the state of discussions in terms of 
opinion space, rather than an ever-expanding list of text or tree. 



8.4 Towards “Social Intelligence Design” 

The theme of this paper is supporting public to participate in discourse concerning 
environmental issues for the achievement of a sustainable community. The un- 
derlining assumption is that members of the community are motivated and en- 
couraged to do so, but may lack the means and opportunities — hence the devel- 
opment of systems support for enabling public discourse. When we observe the 
existence of numerous discussion groups and mailing lists in the Internet, it might 
appear that people already do have the means and opportunity to express their 
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views and carry out discussions. However, if we are aiming to support delibera- 
tive processes for community-based decision-making, we must at the same time 
consider how a community can be enabled to carry out such a process. 

Among the projects that shared the similar aim is RuleNet [11], an experiment 
in supporting consensus building using an electronic conference commissioned by 
U.S. Nuclear Research Council (NRC) involving non-technical members of the 
public on a topic which was until then thought to be highly technical and out of 
limits to them. Participants’ evaluations were reportedly highly positive, primarily 
because it made them feel that their voices were heard and their contributions had 
an effect. However, NRC itself questioned the credibility of discussions and the 
participants’ qualifications. There was also a sense of mistrust among some par- 
ticipants towards NRC concerning its motivation, and the technical staff doubted 
that anything new has been raised. Interestingly, this represents typical stances of 
parties from different sectors involved in social decision-making — regardless of 
whether is it conducted with or without electronic conferencing. Therefore it is 
not the problem of the means, but the capacity of the community to attempt a con- 
sensus-oriented decision-making. 

What is required is “social intelligence’’, as the potential capability of a com- 
munity to engage in informed deliberative decision-making process. To design so- 
cial intelligence, then, would be to enable public to engage in discourse. For this 
purpose, we are attempting to develop means to tap into community information 
held by individual members, and experimenting on a new form of interaction in 
carrying out discussions. These two approaches both require the sense of coop- 
eration; in the former, there must be a motivation among members of the commu- 
nity to gather and share information; in the latter, the participants are expected to 
engage in a constructive deliberative process, and fit into the assumption that de- 
liberation is cooperative work. Ideally, members of the community should be en- 
gaged in public discourse as an everyday activity without being conscious of its 
cooperative nature. One way to achieve it, we believe, is through providing an ac- 
cessible interface to ease individual interaction, and design of interaction that en- 
hances awareness about the others, fostering reconciliation of differences. 

Our current research result is still too premature to judge whether such social 
intelligence design is possible, and extensive experiments and evaluations are re- 
quired to draw any concrete conclusions. This will be our focus upon the imple- 
mentation of prototype systems. 



8.5 Concluding Remark 

Admitting that visions and projects described in this paper^ are rather exploratory 
and speculative, we believe that fostering public discourse addresses a wide range 



^ The projects described in this paper are funded by JSPS Grant-in-Aid for Scientific Re- 
search and JSPS Research for the Future Program. We acknowledge Akira Kawaguchi 
and Toshiyasu Murayama who are developing parts of the systems that are referred to in 
this paper. 
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of issues in supporting interactions in a community. In the more practical side, we 
have initiated a pilot project that attempts to develop a system that supports ana- 
lytic-deliberation by integrating a set of analytical tools into a networked discus- 
sion system, including access to simulators that model environmental effects to be 
used as justifications in the discussion. It is envisioned that approaches described 
in this paper enhance such a system and contribute to it as enabling technologies 
for public discourse in achieving environmentally sustainable communities. 

It is often said that creation of a sustainable community involves capacity 
building. We believe that “social intelligence design” is a form of capacity build- 
ing that enables public to engage in discourse among various value judgments and 
perspectives, and increase awareness about the environment including its inhabi- 
tants and policy makers in order to achieve a rational consensus-based social deci- 
sion-making. 
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9.1 Introduction 

It is the very purpose of the DEMOS* project - the subject of this paper^ - to ex- 
ploit novel forms of computer mediated communication in order to support de- 
mocracy on-line ('e-democracy') and to enhance citizen participation in modern 
societies. 

In the following paper we will firstly point out how DEMOS aims to support the 
democratic process by exploiting the communicative potential of the Internet. 
Secondly, we will introduce a novel participation methodology which is derived 
from different social science approaches. Thirdly we will briefly describe the 
overall design approach. 



9.2 Online Support for Democratic Processes 

Since the neologism 'e-democracy' refers to both computer mediated communica- 
tion and democracy without specifying the underlying concepts, there is a need to 
explain what exactly we mean when using the term. To start with the ‘democracy’ 
part of the term, there are different conceptions of democracy and depending on 
the perspective, different perceptions of how the internet could support, reform or 
even revolutionise the way democracy works. The most common distinction in the 
definition of democracy refers to the ways citizens participate in the decision 
making process and the respective types are called direct or representative democ- 
racy. However, these approaches have to be understood not as alternative, oppos- 
ing systems of democratic governance but as two complementary forms of partici- 



* DEMOS (Delphi Mediation Online System) is funded as a shared-cost RTD project un- 
der the 5th Framework Programme of the European Commission (1ST) and is being de- 
veloped by a research consortium comprising eight organisations from five different 
European countries, representing the fields of academic research, multimedia, software, 
market research and public administration. The DEMOS Project (IST-1999-20530) 
commenced September 2000 and is going on for 30 months. For more information see 
the project web site: http://www.demos-project.org 
^ This report describes the entire spread of the ongoing project and has to be seen as an 
short introduction to the particular fields of research and development which are pulled 
together in DEMOS. 
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pation which exist side by side in every modern society. It would simply not be 
feasible in contemporary societies to ask people for their approval before coming 
to any decisions, like the ancient Greeks did, nor could representative democracy 
dispense with the civil engagement of the citizens. Mostly, ‘e-democracy’ in this 
context calls into question the appropriate mixture of both types of participation, 
not representative democracy as such. Whether or not more direct participation is 
perceived as being desirable, depends on the underlying normative model of de- 
mocracy. 

For the liberal, democracy operates by arranging compromises between citizens 
with different interests on the basis of fair procedures such as equal voting rights. 
The normative implications are low and the liberties of the citizens are above all 
defined as 'negative liberties' in the sense of them not being too much directed by 
the state. From this point of view, more direct participation is only worthwhile - if 
at all -in terms of plebiscites but not in terms of intensified public debate. In this 
case e-democracy would only mean substituting paper-based procedures with 
electronic ones in order to increase convenience and efficiency. By contrast the 
republican approach to democracy believes that "the formation of the citizen's 
opinion and will forms the medium through which society constitutes itself as a 
political whole" (Habermas 1996, 26). Especially in its communitarian reading the 
republican view tends to over-conceptualise ethical values and the need and 
chances for ethically integrated societies. Although, here the Internet could be 
potentially used in its entire diversity in order to support public democratic proc- 
esses, the expectation that electronic networks will leverage the ethical integration 
of society seems to be far too idealistic. Though there might be a "trend towards 
more autonomous local units and the emergence of multicultural and more egali- 
tarian politics, (...) strong counter-tendencies are at work. The Internet is involved 
in this process by both influencing the desired ends and their opposites" (Sassi 
1997, 436). 

Instead of identifying democracy merely with voting like liberal democrats tend to 
or reducing political to ethical questions, like republican democrats are supposed 
to do, a third variant, the discourse theoretic (deliberative) model, focuses on the 
procedures of public will formation. These procedures are considered to generate 
legitimacy and practical rationality (Benhabib 1996, 71). "In agreement with re- 
publicanism, it gives center stage to the process of political opinion- and will- 
formation" (Habermas 1996, 27) but without burdening this process with the ide- 
alistic expectation of enabling the public sphere itself to act. According to the dis- 
course theory this quality belongs exclusively to the realm of the specialised sub- 
system called administration. The purpose of the deliberative process is, though, to 
influence the exercise of power by the administration. "The power available to the 
administration changes its aggregate condition as soon as it emerges from public 
use of reasons and communications that do not just monitor the exercise of politi- 
cal power retrospectively, but more or less program it as well" (Habermas 1996, 
24). In this sense, the project strives to strengthen the legitimacy and rationality of 
democratic decision making processes by using DEMOS to inspire and guide 
large scale political debates, to close the distance between political representatives 
and citizens, experts and laymen. 
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9.3 A Novel Participation Methodology 

The specific communication potential of the Internet can be characterised by the 
three terms - interactivity, speed and scope. Together, these characteristics allow 
novel forms of interactive communication between large numbers of participants. 
On the one hand, it is theoretically possible for an unlimited number of people to 
discuss a common subject- all ‘talking at the same time’ and contributing to the 
same discussion. On the other hand, the same participants could also potentially 
use electronically available information to deepen their knowledge, to give more 
evidence to their arguments or to convince other participants. Furthermore, people 
could form coalitions by getting in touch with like-minded people effortlessly or 
they could group around and discuss certain topics or subtopics of mutual interest. 
To realise this potential however, there is a need for methodologies that match the 
media. They need to be able to aggregate and interrelate the individual contribu- 
tions, to identify and foster the most promising aspects of the discussion, to profile 
different positions and to strive for convergence between them or at least to figure 
out what are the truly disputed aspects where no compromise can be achieved. In 
the latter case, we are always looking for a result from the discussion - whether it 
is a consensual statement supported by a majority of the participants or what is 
called a ‘rational dissent’ ^ Only if the discussion leads to a result is the discussion 
likely to have any influence on political decision-making procedures. This impact, 
of course, can be manifold; if the outcome is a clear statement supported by the 
broad public, it will not be ignored by elected representatives. If the result is 
merely a widespread collection of different viewpoints, it can serve as input to 
prospective laws or it can anticipate future objections to planned policies and the 
like. Taking a closer look at this methodology, we are basically planning to as- 
semble and integrate three well-proven social research methods, namely the Sur- 
vey technique, the Delphi approach'’' and the Mediation method^. The difficulty 
here is that these ideas cannot simply be added and compiled to form a new meth- 
odology because they are, at least partially, contradictory. 

Starting with the classic Survey technique, this method is designed for representa- 
tive opinion polls and contributes to public opinion formation on a large-scale ba- 
sis by including (virtually) the entire population. However, this technique is rather 
unsuitable for interactive participation. Delphi polls, on the other hand, operate 
with a certain amount of interactive feedback, but this has the consequence of 
limited scalability. For DEMOS, Delphi polls are extremely interesting because 
they can be used to exploit expert knowledge. The basic idea is to generate a con- 
sensus among a limited number of domain experts by aggregated feedback. Feed- 
back is supplied by the ‘Delphist’ on a strictly anonymous and statistical basis to 



^ „A rational dissent (...) implies that, on the basis of what is or has been collectively ac- 
cepted, the persons involved succeed in understanding precisely what isn’t collectively 
accepted’’ (Miller 1992, 14). 

As an overview see Florian et al. 1999. 

^ The mediation method is one of the so-called Alternative Dispute Resolution (ADR) pro- 
cedures, which focus on 'informal participation' in the sense that they are not regulated by 
law. See Susskind and Cruikshank (1989), Maerker and Schmidt-Belz (2000). 
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exclude direct personal influence among the participants. A Delphi process runs 
through two (or more) cycles of interview-feedhack-inferview. Affer each cycle 
the experts are asked to rethink their original answers in the light of the statisti- 
cally aggregated ‘group opinion’ that has emerged in the previous cycle, until a 
satisfactory level of convergence or (statistical) consensus is reached. 

Whereas both Survey and Delphi are quantitative methods, the Mediation tech- 
nique is a qualitative method used to reveal problems and resolve conflicts. The 
basic idea of Mediation is that consensus is not a statistical figure but a negotiated 
compromise. Mediation is a group process with a limited number of participants, 
chaired by an impartial mediator, and often running through several cycles of open 
discussion. It is highly interactive and participative, but being restricted to face-to- 
face interaction, it is unsuited for large numbers of active participants. 

The challenge for the DEMOS project is to take the advantages of all three meth- 
ods and combine them into a new methodology for on-line democratic participa- 
tion and interactive conflict resolution. (1) From Surveys it will take the idea of 
mass opinion polls on a large-scale basis, (2) from Delphi it will take the idea of a 
cyclical decision process exploiting expert knowledge, and (3) from Mediation it 
will take the idea of an open process of participative conflict resolution. 

The incompatibilities mentioned earlier can be eased by enriching each of the par- 
ticular methods with elements borrowed from fhe others. For example, instead of 
conducting a standardised survey with pre-formulated questions, the items can be 
generated ‘bottom up’ by sorting and aggregating qualitative semantic content 
from earlier or ongoing discussions. The generation of the questionnaire, then, is 
conceptualised as an interactive process. Like conventional surveys, the main pur- 
pose here is to condense and aggregate information and beyond that to summarise 
the discussion at a certain stage. Accordingly classical Delphi studies can be sup- 
plemented with qualitative, open ended questions and extended to involve higher 
numbers of participants. On the other side, the Mediation method has to be 
adapted to the specific constraints of the Internet, that is mainly to develop func- 
tional equivalents which transfer the method’s core strengths, like creating an at- 
mosphere of confidence and trust from face-to-face interactions, to the on-line 
domain. 

The three social research methods (Survey, Delphi and Mediation) will be applied 
and merged together in the so-called 'DEMOS process'. This process is always 
concerned with one main topic to be commonly discussed on a limited timeline 
under the guidance of on-line moderators. To limit the debate to not more than one 
main topic is a conceptual decision derived from the general objective of the proj- 
ect to concentrate on deliberative discourses with potential impact on public deci- 
sion making process. It also serves to discourage debates from losing any sense of 
direction. As a matter of course several processes can be conducted in parallel and 
each of them will split up into different subtopics during the course of the debate. 
To focus on just one main topic requires a careful selection of the topic to be dis- 
cussed on the basis of general criteria. Within our research project we have found 
that a potential theme should at least meet criteria like popularity, complexity, 
controversy and persistency. The question of to what extent a DEMOS process af- 
fects ‘real-world’ decisions implies additionally a question relating to the general 
success of public discourses, which cannot be expanded on here. 
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The basic process model comprises three different phases each with specific goals. 
The first phase has above all to initiate, facilitate and broaden the debate and sub- 
sequently to identify the most important aspects or subtopics of the chosen subject 
matter. Therefore the moderators have to analyse and cluster the free text contri- 
butions in order to find out the issues most participants seem to be interested in. 
These tasks will be supported both on a methodological and technological level. 
The moderators will be backed up by qualitative methods of content analysis and 
can exploit various mechanisms relating to the social system’s self-organisation. 
A good example of the latter is the detection and use of the thread-generating parts 
of the discussion. Here a text mining tool will be able to automatically group the 
text contributions once a set of categories (subtopics) are defined and illustrated 
by examples. 

Additionally, the moderators will have to summarise the discussion during the 
course of the first phase following a specific procedure. These summaries consist 
of content and progress related parts and highlight and profile emerging lines of 
conflict according to the Mediation method. The first phase finally results in a set 
of proposed subtopics that can be more intensively discussed in separate discus- 
sion forums in the next phase. Since this procedure is relying on interpretations of 
the individual postings as well as of the entire discussion, the result may not ex- 
actly meet the preferences of the participants. At this point the Survey method 
comes into play in order to evaluate whether or not the proposed sub-forums meet 
the demands of the community and if necessary, to generate ideas on how to re- 
vise the list of subtopics. 

In the second phase a limited number of sub-forums will be offered by the system 
on the basis of the poll results. The purpose of this phase is to intensively discuss 
specific aspects in smaller groups of interested participants, while the main forum 
still catches those participants who want to discuss the topic on a more general 
level. Again the moderators will have to summarise the developing debate on a 
regular basis and at the same time try to tease out and manage emerging conflicts. 
This is where the Mediation method comes in as part of the moderator’s task will 
be to clarify how and to what extent people are agreeing or disagreeing and at the 
same time to reduce the distance between diverging positions by deliberative, 
moderated discourses. The results of the second phase should either be agreement 
(consent) or a rational dissent in the sense explained above. If required and appro- 
priate, this opinion shaping process can be enriched and supplemented with expert 
knowledge by conducting Delphi surveys among a predefined set of domain ex- 
perts. Delphi type studies can either be applied in the original fashion e.g. to re- 
duce the uncertainty with respect to future developments or in order to evaluate 
certain positions of the community from an expert point of view. Since even ex- 
perts are often not of the same opinion the Delphi method here provides the par- 
ticipants with a condensed picture of their degree of agreement regarding specific 
issues. Finally the moderators will close this phase with a summary of what was 
discussed so far, and will once again ask the participants for their approval (sur- 
vey). 

The third phase reintegrates the sub-forums into the still existing main forum by 
transferring the summaries and related survey results. Here the participants have 
the opportunity to see the particular subtopic as part of the general subject matter 
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and a ‘big picture’ will emerge. Participants have the last chance to comment on 
the main topic and the assembled results of the sub-forums and the community 
will be asked to rate the subtopics in terms of importance for the main topic that 
the DEMOS process was intentionally set up for. The final result will be a con- 
densed document depicting both the results of a dynamic and deliberative discus- 
sion and the importance accorded its different aspects in the view of its partici- 
pants. 



9.4 System Design 

The design approach for the DEMOS system started with the deduction of the ge- 
neric DEMOS process from the participation methodology as described in the 
previous chapter. Accordingly the graphical user interface (GUI) depicts the main 
characteristics of this process, e.g. visualises the different phases within a given 
time limit, diverse discussion forums and user roles. The navigational concept is 
based on a timeline, which allows the user to discern the current phase of the dis- 
cussion, and the actual topics. Starting from there, users can zoom successively 
into the focus of their interest, that is, into sub-forums and postings. The number 
of sub-forums is limited by the demands of screen design and usability. 

In order to technically support the DEMOS process, the system architecture con- 
sists of four major support components for the modules: Argumentation and Me- 
diation (A&M), Online Delphi Surveys (ODS), Subgroup Formation and Match- 
making (SFM) and Knowledge Management System (KMS). 

The main element of DEMOS is the forum, where topics are discussed under the 
guidance of a moderator. The discussion forums of the Argumentation and Me- 
diation module are provided by the Zeno system (Gordon et al. 2001). Zeno pro- 
vides particular support to trusted third parties (e.g. the impartial mediator) re- 
sponsible for moderating the discussions. The Zeno server is a java based 
application for the www, which enables and facilitates moderated, issue based dis- 
cussion forums in a secure environment. Zeno discussion forums are integrated 
with a workspace facility for sharing classified documents. 

The Online Delphi Survey module provides the moderators with means to gener- 
ate and conduct on-line surveys as previously described. In a first step, a discus- 
sion will be analysed qualitatively and categorised with the help of a text data 
mining algorithm based on standard Bayesian inference methods. This engine is 
able to extract the ‘concepts’, or main ideas out of a free text and to search for 
‘similar texts’ based on comparison of these concepts. Once the moderator has 
clustered the contributions of the users and so preliminary structured the discus- 
sion, she may generate a questionnaire and conduct a detailed quantitative survey 
in order to validate her findings, clarify particular issues or focus on certain as- 
pects. Furthermore the ODS component supports Delphi surveys and the visuali- 
sation of results, which are subsequently used to further organise the DEMOS 
process and also to establish new forums and groups of users. 

The clustering of users is crucial for the scalability of the system. It will be han- 
dled by the Subgroup Formation and Matchmaking module which makes use of 
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different profiling information. To maintain scalability on the technical level, SFM 
is also based on the categorisation tool. The first, limited deployment of the sys- 
tem will lead to a deeper understanding of the users behaviour in the DEMOS en- 
vironment. Once, the behaviour of users and the rules are known precisely, further 
tasks can be automated. It is planned to represent users as well as forums with 
software agents. These agents will carry a set of rules derived form the first, ’man- 
ual’ deployment of DEMOS, which will allow forum agents to match like-minded 
users and experts, user agents to identify appropriate forums and users to set up 
their own groups and forums inline with the progress of the main process. In other 
words more and denser interaction between a large number of participants can be 
realized by the help of software agents in the context of DEMOS than in any real 
world environment. This can be labeled as ’interactive mass communication’, 
which denotes a new interaction type owing to the diffusion of the web. Before, it 
was just part of the definition of mass media, that interaction between sender and 
receiver was inhibited by interposed technology (Luhmann 2000). As new means 
of communication and interaction induce new and unexpected forms of behavior, 
we furthermore expect to observe emergent structures in this ’hybrid society’ 
which may be also of interest for basic research problems like the so-called ’mi- 
cro-macro-link’. This problem is of crucial importance for both sociology and 
computer science® and is especially focussed in the recently established research 
field ’socionics’ (Mueller et. al 1998, Malsch 1998). 

The agent’s ability to learn will be finally used for the Knowledge Management 
System (KMS). As described above the categorisation engine will enable agents to 
search for ‘similar texts’ based on comparison of extracted concepts. In particular, 
this allows agents to find documents, even if they do not contain a desired key- 
word. The agents can then be used to represent a particular set of documents cov- 
ering a certain subject matter. Providing the participants with a couple of initially 
trained agents, the users can further modify their personal copies by retraining. 
Eurthermore, the agents can be shared among the users, so that participants will 
not have to start their own research from scratch, but can retrain an existing agent 
and so reuse the expertise of others’. With the anonymous exchange of agents 
bound to a certain topic, even users with contradictory theories or opinions can 
mutually benefit from their respective research by using foreign agents. Even if 
the agents are not perfectly trained with respect to the information needs of par- 
ticular users, it may at least set them on a new track. The main idea is to enable 
“communication through shared knowledge” (e.g. exchange agents), which was 
one of the initial ideas of Tim Berners-Lee (1997) when developing the world 
wide web. 



® E.g. in the field of ‘Distributed Artificial Intelligence’ (Gasser 1991) 

’ This concept has initially been developed in the project www.estonia-sinking.org (funded 
by the Media II program of the EC), where users and groups with different (even contra- 
dictory) interests and prior knowledge can conduct their research about the reasons for 
the sinking of the ferry Estonia. 
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In this paper, it is discussed how to estimate computer network tools which 
support communications among community members. So far, standard me- 
thods do not seem to be developed enough to evaluate tools appropriately. 
How we should evaluate network communication tools designed to support 
social intelligence and to facilitate knowledge creation in a community? I’ll 
propose some important points which should be taken into account to esti- 
mate tools, and discuss some methods of evaluations through the introduc- 
tion of our trials to estimate the effect of Public Opinion Channel (POC) on 
knowledge creation[10.1, 10.2, 10.7]. 



10.1 Computer Networked Community as Social 
Intelligence 

First of all, in order to discuss social intelligence design, I propose a view- 
point that considers societies and communities (especially computer networ- 
ked communities) as having a kind of intellectual existence. The viewpoint 
would enable us to apply some useful research interests, theories and metho- 
dologies from studies of human intelligence to the discussion. It allows us to 
define the terms social intelligence and social intelligence design as follows^. 
Social intelligence (SI) is defined as an ability which communities have to 
solve various problems. Social intelligence design is defined as the design of 
mechanisms of communities which are related to intellectual activities by the 
communities and their members. For instance, a design of SI means to arrange 
channels of information to facilitate knowledge creation by communities and 
their members. 

The viewpoint mentioned above generate new research interests on SI as 
follows: 

— Do SI develop? Does the development of SI relate to the development of 
communities? 

— What type of network communication systems do SI support? 

^ Some researchers may define the term social intelligence as an ability to get 
along with others, or as the objects which have such kind of an ability [10. 6]. Of 
course, it is very important to discuss how design this kind of objects. But in 
this paper, I don’t use the term SI in this manner. 
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— Can SI be divided into subcategories? Can we apply the distinctions used 
in psychology, for example, fluid intelligence and crystallized intelligence? 
~ Can SI quotient (SIQ) be measured? Can we create SIQ measurement 
tests? 

It is worth dealing with each of these issues, and there are further research 
issues also to be considered. In this paper, I focus on just one of these issues. 
That is. I’ll discuss how to evaluate a social intelligence design — in concrete, 
how the development of SI is measured when a community adopts a new 
network communication tool. 

In following sections, I emphasize here these three points. First, to eva- 
luate tools, a baseline, “control” condition should be set up appropriately. 
The effects of tools can be measured by comparing the case in which tools 
are used and a baseline condition. Second, to evaluate tools, some different 
types of methods should be used together. Especially, researchers do never 
evaluate tools only based on users’ subjective judgments obtained by que- 
stionnaires, estimations, and introspections. Third, I’ll discuss the possibility 
to apply the network analysis to investigate how community members inter- 
act to each other and how knowledge creation is facilitated. 



10.2 The Importance of Control Condition in 
Evaluating Social Intelligence Design 

Various types of network communication tools have been proposed. Some 
of these tools aim to support knowledge creation, and some aim to support 
communication among community members. If tools achieve their goals, their 
mechanisms would apply to developments of new tools. On the other hand, 
if the tools don’t achieve their goals, they should be improved. To estimate 
whether tools attain their functions, the differences should compare perfor- 
mance of a community or community members between the case in which 
tools with the function are used and when tools without the function are 
used. The case in which members use tools without the function is called as 
control condition. When control condition is biased, the effect of the function 
devised on tools cannot be estimated exactly. Thus, it is very important for 
the estimation of the tool’s effectiveness that control conditions are set up 
appropriately. Furthermore, when control condition is not set up, it could not 
be denied the possibility that a community and community members achieve 
performances even if without the tools. But in some researches, tools seem to 
be estimated without setting a control condition. It’s not enough to decide 
whether the tools really support activities of a community and community 
members. 

How should a control condition be set up? One of the appropriate me- 
thods is that tools are designed as a composition of a basic part and some 
additional parts. The case in which people use a tool constructed with only a 
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basic part may constitute a control condition, and the cases where people use 
tools constructed with a basic part and some additional parts may constitute 
experimental conditions. The effect of a tool would be observed as the dif- 
ference between the control condition and the experimental conditions. For 
example, Public Opinion Channel (POC) which is developed and researched 
by my colleagues and me, is designed in such a way[10.1, 10.2, 10.7]. POC 
is an interactive community broadcasting system. POC collects information 
from community members, edits and summarized information, and broad- 
casts it as a story. Community members listen to a story, and respond to 
it. Repeating the cycle, POC creates continuous information circulation in a 
community. To estimate the functions of POC, the case where the system is 
used which has only basic functions, that is, collecting messages and broad- 
casting them, is set up as a control condition. Research issues on POC are 
“How should information be summarized to facilitate knowledge creation in 
a community?” , “Are anonymous communication systems effective to inhibit 
troubles in communication like flames?”, and so on. The cases can be used as 
experimental conditions where POC with additional functions reflecting these 
issues is used. One of possible experimental conditions would be a POC with 
a summarization function. The effect of the summarization function could be 
observed when comparing the differences of some measurements, for example, 
quantities of message circulation, and the increasing rate of users, between 
the control condition and the experimental condition. In a similar way, some 
modules which aim to implement the same function can be compared. 

Tools are not always designed with modules. As another way to set up 
a control condition, typical situations can be used where people use ordinal 
network communication systems like mailing lists, bulletin board systems, 
and chats. For the purpose, it is useful to define typical situations and to 
standardize procedures to collect data and to analyze data. Fujihara and 
Miura observed search engine users who query information from WWW and 
analyzed their behavior[10.4, 10.5]. In the research, they proposed categories 
to describe information query behaviors from WWW with search engines. 
Such research would reveal our common activities in network communities. 
It will give us a baseline to estimate novel network communication tools. 



10.3 How to Evaluate POC 

Methodologies to estimate whether network communication tools facilitate 
knowledge creation could be classified into following three categories: 

— analyses of users’ subjective estimations and introspection collected through 
questionnaire 

— log analyses of users’ behavior in natural conditions 

— experimental methods 
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Analysis of users’ subjective estimations and introspection is a very effective 
method because it is easy to operate and it gives us rich information on 
users’ thoughts directly. On the other hand, the data can be easily biased 
by subjection of users and experimenters. Some researchers reported that 
users do not always recognize their own behavior exactly and their subjective 
judgment and behavior sometimes are divided[10.8, 10.9]. Log analyses and 
experimental methods supplement such a methodological problem because 
they give us information users’ behaviors. But, of course, these methods have 
some problems. It is difficult to operate, and show us only a small part of 
facts on usage of tools. In order to estimate network communication tools, it 
is necessary to use these three methods together. 

Now, my colleagues and I estimates POC with these three methods. 
Among these estimations. I’ll focus on the result of log analysis. It is because 
we are trying to develop the method for analyzing network communication 
tools and knowledge creation generating on network communications, that 
is, the application of the method called network analysis [10. 10]. 

Network analysis is the method to analyze relationships among commu- 
nity members and relationships among companies. It is mainly used in the 
field of sociology. It describes networks as graph structure (Figure 10.1). Each 



Fig. 10.1. Graph structure of network ana- 
lysis. 

node described as circle means a person or a company, and each link means 
the relation between people or companies. It is used to investigate the struc- 
tures of networks, the effect of network structures on community members, 
and its mechanisms. Some methods for quantification are proposed. One of 
the representative quantification methods is degree. Degree means the num- 
bers of links each node has. Especially, links which come into each node are 
called in-degree, and links which go out from each node are called out-degree. 
In this case in-degree is 3 and out-degree is 2. 

With considering each message sent to POC as node, I described a net- 
work. According to the ways how to link nodes, there are some possibilities 
to describe a network. For example, one message and a message replied to 
it can be linked to describe a network, and messages sharing same topics 
can be linked. Here, I adopted the latter way, that is, messages sharing two 
or more content words (almost of which were nouns) were linked. Among 
all messages (about 1530 messages), first 100 messages were used to make a 
graph structure (Figure 10.2). In the usual network analysis, each node repre- 
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IMO 

Fig. 10.2. Progress of number of logs collected into POC 



sents each person or each company. But in this analysis, each node represents 
each message. This may be characteristic of our analysis, that is, analysis of 
knowledge creation^. 

Figure 10.3 shows the network structure based on 100 of POC messages. 
Each square represents each message, and the numbers written in squares 
represent ID numbers of messages. Smaller the ID number is, earlier the 
corresponding message were sent to POC. Twenty nodes had no links to 
other messages (e.g., nodes 2, 7, 20), and some constructed very simple links 
(e.g., links of nodes 69 ^ 70, 33 ^ 40 ^ 44). Sixty-five messages constructed 
highly complex network. It is found that some nodes have many links and 
play cores, central roles in the network (e.g., nodes 53, 82, 97). Other nodes 
have only a few links and play peripheral roles in the network (e.g., nodes 
5, 10, 99). The centrality of nodes can be quantified by degrees. Figure 10.4 
shows degrees, in-degrees, and out-degrees for nodes. 

The average of degrees was about 9. On POC, members would commu- 
nicate on multiple topics in a time. Probably this would lead smaller size of 
the average of degree. Other media, like BBS, people tend to debate one fied 
theme. It is expected that massages have a tendency to share more words 
and the average of degrees is larger. 

Out-degree decreased as the function of ID number, and in-degree increa- 
sed. It was probabilistically reasonable. But some messages had larger degrees 
than this trend. Probably, we could regard such shifted messages as an in- 
dex of the centrality. Five messages had in-degrees lager than the average 
plus 2 standard deviations and six messages had out-degrees larger than the 
average plus 2 standard deviations. These were larger than probabilistically 
calculated values (2.3) if messages were linked according to normal distribu- 

^ There are only a few of researches which use a network analysis to describe 
knowledge representation. For example, Ferstl and Kintch described knowledge 
representations which people made when reading texts[10.3]. 
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Fig. 10.3. Network analysis of POC 



tion. These central nodes have opportunities to connect with themes which 
are originally unrelated to each other. It could be considered the number 
of such nodes reflects how tools facilitate knowledge creation in a network 
community. If so, POC would be regarded as an effective tool to support 
intelligence. 



10.4 Future Works 



The analyses mentioned above were just a first step of our trials. So, we 
have a lot of issues to discuss as future works. It is necessary to compare 
the result with results of network analysis of other media like BBS. As POC 
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Fig. 10.4. Degrees, in-degrees, and out-degrees for nodes. 



is developing, results of network analysis of POC with additional functions 
should be compared with the results mentioned above, that is, POC with 
only basic function. 

Based on the network, there are other possible ways to investigate whether 
the system facilitate our knowledge creation. For example, based on the de- 
grees messages could be classified into some clusters. If there were links which 
connected messages from different clusters, that may indicate the system fa- 
cilitate our knowledge creation. Also, the numbers of links which connected 
chronologically separated messages may be one index of knowledge creation. 
But the network analysis would give us an interesting viewpoint to evaluate 
network communication tools. We have to elaborate to the method of apply- 
ing the network analysis. It is expected that the analysis is an effective way 
to evaluate network communication tools and to investigate our knowledge 
creation. 
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11. Overview 

Akira Namatame 



AESCS-2001 

The first international workshop on Agent-based Approaches in Economic 
and Social Complex Systems (AESCS) was initiated as a result of the gro- 
wing recognition of the importance of the computational approaches to study 
complex economic and social phenomena. The fundamental objective of AE- 
SCS 2001 was to foster the formation of an active multi-disciplinary commu- 
nity on multi-agents, computational economics, social dynamics, and complex 
systems. The aim of AESCS 2001 was also to bring together researchers and 
practitioners from diverse fields, such as computer science, economics, phy- 
sics, sociology, psychology, and complex theory for understanding emergent 
phenomena or collective behavior in economic and social systems. We also 
discussed on effectiveness and limitations of computational models and me- 
thods in social sciences. This workshop also intended to increase the awaren- 
ess of researchers in many fields with sharing the common view that many 
problems economic and social systems will require collective information- 
processing with a large collection of autonomous and heterogeneous agents. 

The technical issues to be investigated include the follwoings: 

1. Formal Theories on Agent-based Approaches 

— agent-based computational foundations 

— theories on rationality, intention, emotion, social action, social inter- 
action 

— heterogeneity and diversity of agents 

2. Computational Economics and Organization 

— agent-based economics 

— market-based computing 

— artificial markets 

— agents in financial engineering 

— econophysics 

— computational organization theory 

3. Formal Theories of Social Dynamics 

— methodologies of modeling social behaviors 

— chaotic and fractal dynamics 

— dynamics of populations 

4. Collective Intelligence 

— collective decision and behaviors 

— emergent intelligence 
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— social intelligence 
5. Related Areas 

— evolutionary economics 

— complex theory 

— evolutionary computation 

— evolutionary games 

We could solicit many high quality papers which reflect the result of the 
growing recognition of the importance of the areas. All papers have receive 
a careful and supportive review, and we selected 13 papers out of 27 for the 
proceedings. We hope that as a result of reading the proceedings you will 
share with us the intellectual excitement and interest in this emerging disci- 
pline. Finally, we would like to acknowledge the support and encouragement 
of many peoples in helping us getting this new conference started. 
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12. Analyzing Norm Emergence in Communal 
Sharing via Agent-Based Simulation 

Setsuya Kurahashi and Takao Terano 

University of Tsukuba, Otsuka 3-29-1, Bunkyo-ku, Tokyo 113-0012, Japan 

This paper describes an agent-based simulation study on the emergence of 
norms on information communal sharing. To carry out the study, we uti- 
lize our simulator TRURL, which (1) contains software agents with decision 
making and communication functions, and (2) has the capability to evolve 
artificial societies with specific characteristics defined by a given objective 
function to be optimized by genetic algorithms. Unlike the literature in social 
psychology research, which mainly applies evolutionary game theory to ho- 
mogeneous agents for the simulation, TRURL focuses on the decision making 
behaviors of heterogeneous agents. Our experimental results have suggested 
that, contrary to the results of social psychology study so far, for information 
oriented properties, free riders in the society will not collapse the norm of 
communal sharing of the properties. 



12.1 Introduction 

A norm in a society generally means expected behaviors of the members, 
decision criteria of the members, and/or the evaluation criteria that the so- 
ciety expects. Norm constitutes social pressures to conform people in a group. 
There are various levels and forms among public and private norms. Exam- 
ples of such norms are (1) customs resulting from daily repeated behaviors, 
(2) morality as criteria of right and wrong, and (3) the law as public forces. 

In this paper, we will focus on a communal sharing norm By the communal 
sharing norm, we means that people share their resources together. Such 
sharing of resources plays an important role as a reciprocal norm in human 
behaviors. Communal sharing encourages us to maintain human relations and 
closeness [12.1]. The resources for communal sharing include money, physical 
properties, services, love, social approval, and information [12.2]. Recent rapid 
development of the Internet has widely changed our society characterized 
by information networks. Based on the viewpoint, this paper analyzes the 
birth, growth, and stability of communal sharing of information resources in 
a society. 

To carry out the study, we adopt an agent-based simulation model. Agent- 
based models can usually find macro phenomena from the interactions among 
agents. Although a model designer knows functions and natures of agents, 
(s)he doesn’t know what phenomena would happen as a whole during the 
simulation. Contrary, in the following aspects, our agent-based model is dif- 
ferent from conventional macro models to analyze social phenomena. Our 
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approach is characterized by the facts that: (1) the simulation model con- 
sists of heterogeneous agents, which have functions of decision-making and 
communication; (2) we observe emergence of social phenomena as a result 
of optimization of a social macro index by genetic algorithms; and (3) we 
analyze the emergent phenomena and characteristics of each agent. 

This paper is organized as follows: We first discuss several existing norm 
studies so far. Then, we briefly describe our simulator TRURL and apply it to 
the analysis on the communal sharing norm. Finally, we state the effectiveness 
of our agent-based simulation model. 



12.2 Related Work on Studies of Norms 

Norms include personal norms and group norms. They can prevent someone 
from doing deviant behaviors through rewards and punishments in order 
to reduce tensions in a group. Norms urge people to conform to common 
judgments and behavioral patterns. Norms are predominant means to control 
a society and/or firms. We classify studies of norms into the following areas. 

Economic institution analysis. Economic institution analysis usually utilizes 
evolutional game theory. Researchers on the area have discussed the emer- 
gence and stability of diverse economic institutions [12.3, 12.4, 12.5]. Their 
basic technique, evolutional game theory analyzes economic institutions ba- 
sed on the concept of Evolutional Stable Strategy (ESS). Using the concepts, 
they have described the stability of economic institutions, the path depen- 
dency, and the complementarities of institutions. Aoki[12.6] has found two 
institutions of corporation systems as equilibrium points of the evolutional 
game. Their approach is applicable to analyze the emergence and stability 
of economic institutes about norms, however, they do not consider dynamic 
interactions among agents nor mutual understanding about agents’ inside 
models. 

Social network. In social network research, graph theory is often used. A 
center of an organization and a hidden relation among members are discussed 
using graph theoretic mathematical models [12.7]. In a network structure and 
a protocol analysis of electronic communities, socio-metric measures such as 
a degree of leadership existence have been proposed. They show birth, growth 
and maturity of norms in electronic communities. 

Social psychology and cognitive science. In social psychology, norms of human 
behaviors have been investigated with various data of psychological experi- 
ments. Processes to form norms have been analyzed experimentally in terms 
of leadership [12.8], the effects of group pressure [12.9], and the influence of 
a consistent minority [12.10]. 

Intolerant members who show an attitude of refusing resource sharing are 
critical barriers for the free riders. The experimental results have shown that 
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those intolerant members are able to inhibit the emergence of the free riders. 
In summary, they have reported that (1) Intolerance is a stability condition of 
a norm, and (2) As a result that people choose adaptive behavior at a micro 
level, a communal sharing norm emerges at a macro level. These researches 
[12.11] [12.12] show us new viewpoints about group human behaviors that 
there is an evolutional process to adapt to an environment for a basis of 
forming a norm. 



12.3 Artificial Society Model TRURL 

The roles of computer simulations in organization theory have been re- 
evaluated in social science literature. However, many of the approaches seem 
to report too artificial results. To overcome such problems, we have deve- 
loped a novel multi-agent-based simulation environment TRURL for social 
interaction analysis. 

— The agents in the model have detailed characteristics with enough para- 
meters to simulate real world decision making problems. 

— Instead of manually changing the parameters of the agents, we evolve the 
multi-agent worlds using GA-based techniques. 

— Each agent exchanges knowledge and solves its own multi-attribute decision 
problems by interacting with the other agents 



12.3.1 Agent Architecture 

Roughly, an agent in TRURL has event-action rules. Each agent exchanges 
knowledge and solves its own multi-attribute decision problems by interacting 
with the other agents. Predetermined parameters define the agents’ congeni- 
tal characteristics. The parameters are not changed during one simulation, 
but are tuned by GA operations when the world evolves. 

Pp = {Cp,Ps,Pr,Pa,Pc,n,a,P,J, S,p), 

where, is gene sequences, Cp is physical coordinates, Ps is probability 
of message sending, p^ is probability of message reading, pa is probability of 
replying attitudes for pros-and-cons, Pc is probability of replying attitudes 
for comment adding, S is metabolic rate, p is mutation rate of knowledge at- 
tribute values, a, (3, and 7 are parameters , and n is the number of knowledge 
attributes the agent has. 

These parameters represent characters of agents. 

The agent usually has some subset of knowledge only which the agent can 
use for decision-making. The knowledge the agent has is a set of knowledge 
attributes, which is defined as: Kd = {N, W, E, C}, where N is name of the 
knowledge attribute, W is importance weight of the attribute, E is evaluation 
value of the attribute; and C is credibility weight of the attributes. 
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12.3.2 Communication and Action Energy 

A communication process can be considered as a decision-making process on 
the basis of conformity behavior. In this model, we define some parameters 
of knowledge attributes, which change when an agent receives a message. We 
show those definitions of parameters, which are weight w, evaluation e, and 
credibility c. As the result, knowledge of a high credible agent may affect a 
low credible agent. When both agents have the same tendency about some 
knowledge, their credibility increases each other. 



- ^kd) ■ max{0, <4^ - cla) 

jes 

f^^^kd - (^kd) ■ max{0, cl^ - cIj) 

jes 

^ - 4dl) • max{0, - cIj)) 

ies 

'*k>kdT ^\dT ^\d weight, evaluation, credibility of agenti's knowledge at- 
tributes kd, a ,/?, 7 are transfer ratio. S' is a set of agents who send messages 
to agenti in period t. 

Action energy m, which is an acquired parameter increases in proportion 
to the amount of information that the agent has gotten, m is initialized in a 
random order by normal probability distribution. It decreases by metabolism 
6 when the agent send information to the other agent. On the other hand, if 
the agent receives valuable information from the other agent, it increases. It 
regularly decreases while it does not communicate others. 



12.3.3 Inverse Simulation 

In a regular simulation method, we get results successively while the para- 
meters are adjusted. The inverse simulation of TRURL gives an objective 
function at the beginning, then searches for parameters evolutionarily. We 
don’t adjust them intentionally. Accordingly we can know what nature or 
character of agents creates the organizational structure of the society after 
communication. 

Artificial society TRURL generates many societies with genetic algorithms, 
then it can recreate a similar society in terms of a social macro index. Each 
society is represented as genes of predetermined parameters of agents who 
constitute those societies. Those societies are evaluated with a social macro 
index after interactions among agents. Selection, crossover, mutation and 
reproduction are repeatedly carried out. The social architecture is gradually 
organized by a social index as an objective function. 
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Social network researches have shown that the process of communication 
and opinion formation in a community can be measured with a socio-metric. 
If this socio-metric is the objective function of the artificial society, we can 
recreate the same phenomenon as a real society [12. 16]. 



12.4 Experiments 

In this section, we describe experiments whether the sharing norm of infor- 
mation properties is stable or not. We constitute three kinds of society, and 
experiment about an amount of information, a free rider, an intolerant agent 
and information gap. 



12.4.1 An Amount of Information in Each Society 

We design the following three artificial societies: 

1. Face-to-Face communication oriented society (FFS) 

The communication among the agents is constrained by both the phy- 
sical and mental coordinates. They interact with physical and mental 
neighborhoods. The ratio is parameterized. 

2. E-Mail oriented society (EMS) 

The communication among the agents is constrained by the mental co- 
ordinates. In this society, agents interact each other one by one at each 
step. 

3. Net-News oriented society (NNS) 

NNS is an extension of EMS. It has a virtual whiteboard at the center 
of The world. Agents in the world send messages to the whiteboard, and 
the whiteboard distributes the messages to all the agents. The credibility 
value of the messages is the same as the one of the senders. 

We set one agent with a lot of information^^^^j^ ^kd^kd participate in 
each society. Figure 12.1 shows the change of information amounts in each 
society after 300 periods of communication. Y-axis is an average of all agents’ 
information amount and X-axis is the communication amount. The initial 
values of predetermined parameters are set to random. While the information 
amount of FFS changes slowly, the amount of EMS changes rapidly at some 
parts. The amount of NNS changes rapidly at a part, and then it is saturated 
at a burst. 

We consider that the cause is the restriction of information. In EMS, a 
credibility distance decides a receiver. If the society forms a crowded group 
temporarily, the agents will communicate rapidly in the group and vice versa. 
In NNS, if an agent sends a worthy message, credibility of a forum where the 
agent participates will increase and the agents will communicate rapidly. 
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Fig. 12.1. The 

change of information 
quantity (SOOterms, 
SOagents/society) 



12.4.2 Emergence and Collapse of a Norm 

A society with the common sharing norm is advantageous. Such an advan- 
tage is observed in the rapid increasing of information in NNS. One agent 
tends to send messages to Netnews, because the agent can get more worthy 
information in NNS than FFS. Netnews is thought of as equipment that sha- 
res information resource in network society. It appears as phenomena that 
agents approach to Netnews. Figure 12.2 shows the experiments in NNS. 




Fig. 12.2. Occurrence and collapse of norm in 
Netnews society(left: after lOterms, right: after 
SOterms, SOagents/society) 



The center rectangle represents Netnews. In the early stage of communi- 
cation and interaction among agents, posting and acquiring information via 
Netnews increases rapidly, then the agents concentrate to the center (the left 
figure). This indicates that maintenance of a communal sharing norm is the 
advantage for each agent. 

In the second stage that agents communicate frequently, however, the 
agents leave from Netnews (the right figure) . The cause is likely the uniformity 
of knowledge. It is difficult for agents to get new information in this stage. 
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It shows that if free riders that pay no cost for posting messages increase, 
Netnews will lose its worth. 



12.4.3 Emergence and Control of Free Riders 

Figure 12.3(the left figure) shows the change of average send-gene of all agents 
in FFS. It is one of predetermined parameters. Y-axis is average sending pro- 
bability. X-axis is the amount of communication. Sending probability decrea- 
ses slowly. It demonstrates free riders emergent in FFS. In FFS simulation 
results, we have also found the same phenomenon in EMS. An agent loses 
the energy for communication gradually, while it gets the energy for worthy 
information. The free rider can live forever because it doesn’t send and only 
gets information. As the result, the amount of sending messages decreases, 
and agents who have the sharing norm lose the energy. Because they can’t 
get worthy information, though they expect rewards as sharing for sending 
messages. Eventually all of them would go away. 
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Fig. 12.3. The change of send-gene and the effect of tolerant agents. Left is in- 
creasing of free riders. Right is controling of free riders (1000 terms. Average of 30 
agents/society) 



We extended the model not to send a message if the agent is a free rider. 
The result of the experiment is shown in Figure 12.3(the right figure) . Free 
riders can’t get more information, and they lose their energy. As the result, 
decrease of send-gene is controlled in the society. So existence of intolerance 
agents can control free riders without an explicit punishment. It demonstrates 
the reason why implicit norms exist except explicit norms such as the law. 
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12.4.4 Information Gap 

The information gap among agents and efficiency of information acquisition 
can be examined using Inverse simulation of TRURL, We can know the nature 
of information rich agents. The information gap can be measured with Gini 
index. Gini index is a sample statistic in economic categories and represents 
an income gap. The larger Gini index values means the more income gaps, 
that is, there are the more difference of incomes among the rich and the poor. 




^Gini — 1 



{EtotAtot) 



sort data: EijAi > Ei_ij Ai_\^ Af. ’’people” (population in groupi), Ei: 
’’wealth” (the amount of information in groupi), Atot = Etot = 

J2i=l..N 

As shown in Figure 12.4.4, we simulated 20 societies with Gini factor as 
an objective function The results are that maximum Gini factors are 63% in 
FFS, 54% in EMS and 48% in NNS. It shows a relation as EtoF > Email > 
Netnews{the upper part of Table 12.1). 

Netnews society has less information gap than the other societies. We have 
observed the genes of the information rich agents in each society. In FFS, 
the rich agent is to send many messages (Probably 0.75) and to read them 
frequently (Probably 1.0). In NNS, the rich agent is to send few messages 
(Probably 0.20) and to read them occasionally (Probably 0.92). 

These results suggest the following hypotheses: In FFS, the active agent, 
which gathers information by itself sends the information and listens to other 
agents frequently, can become the information rich. In NNS, the Net surfer 
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Table 12.1. The gap and nature of information rich persons / Difference of energies 
in each society 





FFS 


EMS 


NNS 


Max Gini index 


63% 


54% 


48% 


Sending probability of the rich 


0.75 


0.63 


0.20 


Receiving probability of the rich 


1.0 


0.94 


0.92 


Max energy 


54 


77 


165 


Max energy ratio 


1.0 


1.4 


3.1 



agent, which sends few messages, can become the information rich. It reads 
information on the Net instead of gathering information spending costs, EMS 
is seated at the midpoint. In addition, we used the following objective function 
for the simulation. • iTiiisactionenergyofagenti 

This maximizes the amount of action energy, which is the difference bet- 
ween information value and gathering cost. It indicates that the agents com- 
municate their information efficiently and represents the efficiency of the 
society to gather information. The result is shown in the under part of Table 
12 . 1 . 

NNS has the ability to gather information 3.1 times as much as FFS. 
Although the information rich agents exist in NNS, the information gap is 
less than other societies. 



12.4.5 Discussion 

When we observe the change of information in three societies, the following 
results are suggested: NNS has big communication ability, a free rider occurs 
in any society, an intolerance agent can control free riders, and the informa- 
tion gap in NNS is the smallest. 

From the viewpoint of the communal sharing norm, the experimental 
results have implies the following items: Information property has a different 
nature from physical resources in terms of sharing. Sharing and distribution 
of information don’t mean to reduce their property values. Netnews, which 
is an equipment to share information, can control free riders and reduce the 
information gap in NNS. On the contrary, agents, which don’t participate 
to the Netnews society, might expand the information gap. Digital Divide 
might be one of these phenomena. So the results may persuade to change a 
definition of a free rider in NNS. 

Before the experiments, our hypothesis was that the communal sharing 
norm would easily collapse and increase free riders in NNS such as an advan- 
ced information society, because the society wouldn’t have severe morality 
like punishments and intolerance. However, the simulation results have refu- 
sed the hypothesis. The information gap didn’t expand more than our predic- 
tion in NNS. Although free riders emerged in NNS, they didn’t collapse the 
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norms. Then we assumed that a manager of venture type easily gets richer 
than a manager of traditional type. However, the former types have become 
richer than the latter types. 



12.5 Conclusion 

This paper has described agent-based simulation and their experiments about 
a communal sharing norm. We have simulated several evolutional artificial 
societies with multiagents. We have observed the emergence, collapse and 
control of norms in FFS, EMS and NNS. Using TRURL, we could analyze 
the nature of social interactions in the artificial world. Wwe have also de- 
monstrated that the technique of Agent-based Simulation could contribute 
to resolve organizations and social phenomena. 
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Abstract. This paper stresses the importance of focusing on mode- 
ling processes in order to make cumulative progress in agent-based 
approaches. In this paper, we introduce our approach to analyzing 
modeling processes and investigate its possibilities toward cumula- 
tive progress. The capabilities of our approach can be summarized as 
follows: (1) our approach has great potential to promote cumulative 
progress in agent-based approaches; and (2) the elements found by 
our approach have high possibilities of affecting the real world, being 
utilized as tool-kits, and supporting the KISS principle. 

Keyword: agent-based approach, computational simulation, cumu- 
lative progress, modeling process 



13.1 Introduction 

An agent-based approach can provide techniques and tools for analyzing com- 
plex organizations and social phenomena. This approach explicitly examines 
organizing processes and social dynamics and builds theories by clarifying va- 
gue, intuitive, or under-specified issues in conventional approaches. Although 
research on agent-based approaches has recently attracted much attention, 
the approaches actually have a long history. Major examples originally inclu- 
ded garbage can model [13.10], iterated prisoner’s dilemma (IPD) [13.2], mul- 
tiagent soar [13.6], and Virtual Design Team (VDT) [13.17]. Following these 
models, several others are proposed such as sugarscape [13.13], ORGAHEAD 
[13.8], PCANS ^ [13.15], simulating society [13.14], and agent-based compu- 
tational economics (ACE) [13.24]. 

These several models and methods contributed to our understanding of 
complex organizations and social phenomena. However, in Cohen’s phrase, 
“disciplines or fields of study do not get much progress due to a lack of 
cumulative progress in agent-based approaches [13.12].” This indicates that 
agent-based approaches do not compel new investigators to build on the ac- 
complishments of older works, even though these previous works provided a 

* Paper submitted to Exploring New Frontiers on Artificial Intelligence, Series on 
Advanced Information Processing (AIP), Springer 
^ This model is currently extended to PCANSS. 
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lot of useful results and showed high potential to understand other important 
issues in organizational and social science. So, what is a main cause of this 
problem? How do we overcome this problem? Unfortunately, these questions 
are left behind the colorful and powerful simulations. Since agent-based ap- 
proaches cannot avoid tackling these questions, this paper aims to summarize 
the factors that prevent cumulative progress in agent-based approaches and 
shows that our approach offers possibilities of assisting cumulative progress 
in agent-based approaches. 

This paper is organized as follows. Section 13.2 starts by describing the 
factors that prevent cumulative progress in agent-based approaches, and Sec- 
tion 13.3 explains our approaches toward cumulative progress. The potential 
and capabilities of our approach to promoting cumulative progress are dis- 
cussed in Section 13.4, and our conclusions are finally made in Section 13.5. 



13.2 Can We Assist Cumulative Progress? 

13.2.1 Problems in Agent-Based Approaches 

In the previous section, we pointed out that the main problem of agent-based 
approaches is the lack of cumulative progress. So, what has caused this lack of 
cumulative progress? There are many reasons. According to Cohen, “A lack 
of mathematical tools is a part of the problem. But, there are many other 
problems, including the way we train our students and evaluate research 
projects, with too little emphasis on building on what is known, and too 
much emphasis on novelty and on the promise of more powerful computation 
[13.12].” 



13.2.2 Points for Cumulative Progress 

Toward overcoming the above difficulties, it is useful to enumerate the points 
that promote cumulative progress in agent-based approaches. Considering 
Cohen’s claim, the following solutions offer significant toward cumulative pro- 
gress. 

— (a) Common test-beds: First, sharing common test-beds is a promising 
approach for cumulative progress. The reasons are summarized as follows: 
(1) common test-beds enable researchers to narrow an argument down 
to concrete and detailed issues, which help to providing a fruitful and 
productive discussion; and (2) common test-beds encourage researchers to 
share results, which leads to progress in the field by comparing results or 
competing with other researchers. 

— (b) Standard computational models: Next, standard computational 
models are necessary for cumulative progress. This is because (1) rese- 
archers do not need to design computational models, which contribute to 
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bringing several researchers together toward a progress in the field; and (2) 
common parts of various research efforts become clear through the deve- 
lopment of libraries of computational models, which provides the essential 
parts of agent-based simulations. 

— (c) Validation and advance of older works: Third, it is important to 
validate older results and advance older works for cumulative progress. In 
this case, the replication of older models is essential to validate and advance 
older works. To promote this, researchers should share and understand 
what were done and what are not in agent-based approaches. 

— (d) Standard evaluation criteria: Finally, standard evaluation criteria 
for results (including papers and projects) are indispensable for cumulative 
progress. Although it is difficult to evaluate results appropriately, it is 
important to apply the same evaluation criteria. For instance, a bench- 
mark in evaluation criteria would be useful for cumulative progress. 

In addition to the above points, the following points are also important to 
promote cumulative progress, though they are not restricted to agent-based 
approaches: (1) regular meetings that enable researchers to constantly share 
results; and (2) appropriate teaching of students. 



13.2.3 Cumulative Progress in Current Projects 

Based on the above four points, this subsection analyzes how current agent- 
based research can achieve these goals. 

For common test-beds, the U-Mart project [13.19], for instance, is recogni- 
zed as a common test-bed for a virtual stock market in the economic field. 
Although this project began a few years ago, it has promoted cumulative pro- 
gress by narrowing arguments down to a concrete and detailed stock market 
and by sharing the results among researchers. 

For standard computational models, Axelrod and his colleagues developed 
standard computational models by employing their existing models (such as 
garbage can model dmd iterated prisoner’s dilemma (IPD)) [13.3]. Kurumatani 
is also developing libraries of standard parts in World Trade League [13.16], 
which aims to provide a multiagent-based universal environment for analyzing 
economic and financial systems. 

Although these efforts have promoted cumulative progress, conventional 
agent-based approaches have not so far fully address the four points. 



13.3 Exploring Key Elements 

Since it is not easy to promote cumulative progress in agent-based approaches 
as described in the previous section, this paper starts by investigating how 
our approach [13.21] can promote such cumulative progress as the first stage 
of our research. 
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13.3.1 Interpretation by Implementation 

Outline. The Interpretation by Implementation (Ibl) approach is a trial and 
error method for seeking underlying elements of organizations or societies 
through a process of continuing the implementation and interpretation phases 
in turn. The concrete algorithm of the Ibl approach proceeds as follows. 

1. First, the Ibl approach implements a model (z.e., model A in Figure 13.1) 
while focusing on a modeling process. In this stage of the Ibl approach, 
the following three processes are employed: (a) concept breakdown, (b) 
assumptions/premises modification, and (c) investigating layers change 
(all three detailed process will be described later). 

2. Next, the Ibl approach interprets results to investigate the underlying 
elements that determine the characteristics of multiagent organizations 
or societies. 

3. If the essential elements are found, then this process is finished. If not, 
new models (z. e., models B, C, • • • in Figure 13.1) are implemented to 
investigate other elements; then, goto 2. 




Fig. 13.1. Interpreta- 
tion by Implementation 



What is important to note here is that the Ibl approach focuses on the 
influence of elements embedded in a modeling process on results. Since these 
elements have a big influence on results, we must consider such an influence 
when employing agent-based approaches. However, it is difficult to visualize 
these elements, and thus the Ibl approach employs a trial and error method 
that explores essential elements by changing them. 

Elements Embedded in a Modeling Process. From the previous sec- 
tion, the important point is to decide what kinds of modeling processes we 
should focus on. In this stage of the Ibl approach, the following three proces- 
ses are employed for the following reasons. Note that we never claim that the 
following three processes are sufficient for finding the underlying elements in 
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organizations or societies. Other viewpoints can be considered in addition to 
the following processes. 

— Concept breakdown: When implementing a concept in a computational 
model, we must clarify abstract parts by breaking the concept down into 
detailed and operationalized parts from the computational viewpoint. Since 
characteristics of multiagent organizations or societies change depending 
on such a breakdown process, key elements are likely to be embedded in 
this modeling process. 

— Assumptions/premises modification: We tend to implement computa- 
tional models under assumptions or premises that are generally set uncon- 
sciously. This tendency increases as we concentrate on investigating issues. 
However, such assumptions or premises have a high possibility of being key 
elements because the results drastically change by varying assumptions or 
premises. 

— Investigating layers change: When investigating characteristics of or- 
ganizations or societies, some of the characteristics are found in a certain 
layer while others may be found in another layer. This indicates that a 
change in the layer for an investigation has the potential of finding new 
key elements that affect the characteristics of organizations or societies. 



13.3.2 Applications of Ibl Approach 

This section briefly describes three applications of the Ibl Approach. 

Concept Breakdown. Organizational learning(OL) [13.1, 13.11] is roughly 
characterized as organizational activities that solve problems that cannot 
be solved at an individual level, and it has a large influence on the charac- 
teristics of organizations. However, the concept of OL can be implemented 
(broken down) in many ways from a computational viewpoint. Focusing on 
this feature, we found that the following three elements affect the characte- 
ristics of multiagent organizations through breaking the concept of OL down 
in a certain way [13.20]: (1) the independence of learning mechanisms; (2) 
the execution order of learning mechanisms; and (3) the combination of ex- 
ploration at an individual level and exploitation at an organizational level. 
These implications can be revealed through the implementation of a concept 
breakdown and the interpretation of simulation results. 

Assumptions/premises modification. As shown in the typical example 
of the prisoner’s dilemma [13.4], agents are roughly divided into the following 
two categories: (1) the selfish or competitive type and (2) the altruistic or 
cooperative type. This classification effectively distinguishes goals of agents 
at individual levels from those at organizational levels. However, we found 
that an evaluation of agents affected the characteristics of multiagent orga- 
nizations more than the goals of agents [13.22]. This implication cannot be 
revealed from a goal-related perspective but through the implementation of 
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varying premises by adding an evaluation perspective and the interpretation 
of simulation results. 

Investigating layers change. One of the important problems in an orga- 
nization is solving the trade-off between exploration and exploitation [13.18]. 
To address this issue, we focused on the fact that the trade-off between ex- 
ploration and exploitation is not embedded in one layer but found in several 
layers. Then, we found that a certain problem-specific trade-off could con- 
tribute to solving the fundamental trade-off between solutions (related to 
exploration) and costs (related to exploitation) [13.23]. This implication can- 
not be revealed by only considering fundamental trade-offs but through the 
implementation of a framework that provides an investigation of other trade- 
offs and the interpretation of simulation results. 



13.4 Discussion 

13.4.1 Cumulative Progress 

First, we discuss how our approach has the potential to promote cumula- 
tive progress in agent-based approaches. As mentioned in Section 13.2, the 
following four points are important for cumulative progress: (a) common 
test-beds; (b) standard computational models; (c) validation and advance 
of older works; and (d) standard evaluation criteria. Although conventional 
agent-based approaches do not encourage researchers to fully address the four 
points, our approach tackles them as follows. 

— (a) Common test-beds: Since factors and assumptions embedded in a 
modeling process are mostly general, researchers not only can share results 
but also utilize factors. This indicates that our approach does not require 
common test-beds to share and utilize results. This advantage does not 
force researchers to adjust their ideas or methods to common test-beds. 

— (b) Standard computational models: Factors and assumptions em- 
bedded in a modeling process are kinds of common parts in simulation 
models. Therefore, standard computational models can be developed by 
combining several kinds of factors and assumptions. 

— (c) Validation and advance of older works: Factors and assumptions 
embedded in a modeling process are simple because they can be divided 
into each element. From this feature, it is easy to replicate older models 
if the factors and assumptions of older models are analyzed in advance, 
and such replication encourages researchers to validate older results. Fur- 
thermore, researchers have the chance to advance older works by simply 
adding and removing factors and assumptions. 

— (d) Standard evaluation criteria: Since factors and assumptions em- 
bedded in a modeling process are independent from addressed issues, re- 
searchers can concentrate on evaluating the essential degree of these ele- 
ments. For instance, we measure such factors and assumptions in terms 
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of degrees of influence on results, simplicity of implementation, and so on. 
These degrees in evaluation criteria can be considered as bench-marks in 
agent-based approaches. 

From the above analysis, our approach has great potential to promote cu- 
mulative progress in agent-based approaches. However, we should not neglect 
the following point to effectively receive the advantages of our approach: it is 
important to (1) store a lot of factors and assumptions embedded in a mode- 
ling process and (2) systematize these elements in advance for easy utilization. 
If the above points are achieved, our approach enables us to understand what 
has been done and what remains to be done in agent-based approaches by 
simply investigating the repository of underling factors and assumptions. 



13.4.2 Potential of Our Approach 

Next, we discuss the potential of our approach in terms of the following 
viewpoints: (1) linkage to real world; (2) tool-kits; and (3) the KISS principle. 

Linkage to Real World. Linkage to the real would is one of the major 
problems in agent-based simulations. Even though many useful implications 
can be found in computational simulations, we cannot guarantee that these 
implications are valid in the real world. For this problem, Axelrod answered 
in his book as follows: “Although agent-based modeling employs simulation, 
it does not aim to provide an accurate representation of a particular empi- 
rical application. Instead, the goal of agent-based modeling is to enrich our 
understand of fundamental process that may appear in a variety of applica- 
tion. [13.4].” Carley, who proposed the concept of computational organization 
theory (COT) [13.7], responded as follows: “Human organizations can be vie- 
wed as inherently computational because many of their activities transform 
information from one form to another, and because organizational activity is 
frequently information driven [13.9]”. This assertion supports the effectiven- 
ess of computational analysis. 

Concerning our approach which focuses on factors and assumptions em- 
bedded in a modeling process, these elements offer potential power to affect 
the real world. This is because the elements have a large influence on results 
even when they slightly change. Although simulation results do not follow 
the real world because the real world includes several kinds of unexpected 
factors and the observed phenomena only show one aspect of the real world, 
our approach can identify essential keys that affect the real world. 

Tool-kits. Recently, a lot of agent-based simulators, including Swarmf have 
been proposed and these have contributed to understanding complex orga- 
nizations and social phenomena. However, the following important problems 
still remain: (1) agent-based simulators are mostly useful for visualization 

See http://www.swarm.org for details. 
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tools, not for computational simulation tools. This is because we have to 
design essential parts of simulations such as internal models of agents. (2) 
Agent-based simulators are often built for specific issues. Researchers also 
build their own tools instead of using tools built by others, and thus it is dif- 
ficult to share the same agent-based simulators. These two problems clearly 
prevent cumulative progress in agent-based approaches. To overcome these 
problems, Axelrod devoted himself to developing general tools for agent-based 
approaches [13.3], but he finally gave it up, because most tools for social and 
organizational simulations have to be designed for specific tasks, and thus 
few parts can be shared or applied to other models [13.5]. 

In comparison with the above conventional agent-based simulators, our 
approach has the capability of extracting common parts of simulations by 
exploring factors and assumptions embedded in a modeling process. Since 
these common parts are mostly related to fundamental parts of an agent 
design and are not specific to addressed issues, they can be used as tool-kits. 
This indicates that our approach provides general tool-kits that are difficult 
to find by developing domain-specific tool-kits. 

KISS Principle. The KISS principle^ proposed by Axelrod claims that sim- 
ple models should be implemented to understand the fundamental processes 
in organizational or social phenomena [13.4].^ This suggestion implies that 
one can be confident of understanding results by knowing everything that 
went into the model. Note that the KISS principle does not merely claim to 
make everything simple but also to leave essential parts by removing non- 
essential ones. Based on this claim, one important question remains: how do 
we figure out the essential parts? According to Axelrod, one method is to 
conversely derive the essential parts by investigating results and facts [13.5]. 
However, this derivation requires good sense, and it is neither easy nor an 
application of the scientific method. In comparison with this situation, the 
factors and assumptions found by our approach have high possibilities of 
being essential parts because these elements change the characteristics of 
multiagent organizations or societies. Of course, all elements are not required 
to implement models, but it is significant to consider such elements as candi- 
dates before implementing models. From this advantage, our approach offers 
great potential to support the KISS principle in terms of finding essential 
parts. 

® This principle stands for the army slogan keep it simple, stupid. 

Strictly, he pointed out that assumptions underlying the agent-based model 
should be simple and also claimed that the complexity of agent-based mode- 
ling should be in the simulated results, not in the assumptions of the model. 
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13.5 Conclusions 

This paper stressed the importance of focusing on modeling processes toward 
achieving cumulative progress in agent-based approaches. In particular, this 
paper suggested that the analysis of modeling processes can help to find 
elements that directly affect the characteristics of multiagent organizations 
or societies. Furthermore, we also showed that these elements were useful for 
an alternative understanding of complex organizations or social phenomena. 

By investigating the capabilities of our approach, we found the following 
two implications. First, our approach has great potential to promote cumula- 
tive progress in agent-based approaches in terms of (a) common test-beds, (b) 
standard computational models, (c) validation and advance of older works, 
and (d) standard evaluation criteria. Second, the elements found by our ap- 
proach offer the high possibilities of affecting the real world, being utilized 
as tool-kits, and supporting the KISS principle. 

However, this paper only discussed the high potential of our approach 
for cumulative progress and did not prove them in the real would. Further- 
more, this paper did not specify the range in which our approach showed 
its effectiveness. These should be addressed in the near future. We also have 
to investigate when and what elements found by our approach should be 
considered for particular situations. 
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In this study we rethought efficient market hypothesis from a viewpoint of 
complexity of market participants’ prediction methods and market price’s dy- 
namics, and examined the hypothesis using simulation results of our artificial 
market model. As a result, we found the two differences from the hypothe- 
sis. (a) Complexity of markets was not fixed, but changed with complexity 
of agents, (b) When agents increased the complexity of their prediction me- 
thods, structure of dynamic patterns of market price didn’t disappear, but it 
can’t be described by equation of any dimensions. 



14.1 Introduction 

Are you surprised if the performance of financial specialists’ forecasts is the 
same as that of randomly generated forecasts? 

In the field of economics, the theory of financial markets called the efficient 
market hypothesis was proposed in the 70s, and it has caused many arguments 
till today. By this hypothesis, the movement of the price of financial markets 
is a random walk, and cannot be predicted. Therefore, the performance of all 
the forecasts is the same. Theories of financial engineering, which developed 
greatly today, are based on this hypothesis, and they assume financial prices 
as the stochastic process. 

Although many statistical verification of the hypothesis was performed 
using actual data, since the hypothesis included a market participant’s ex- 
pectation formation, it has not been verified directly. In recent years, however, 
the artificial market approach which builds a virtual market model and per- 
forms a simulation into a computer appeared, and researches in this approach 
try to verify the hypothesis directly[14.1, 14.2, 14.3]. 

This study rethinks the efficient market hypothesis from the new view- 
point of the relation between the complexity of market participants’ predic- 
tion formulas and the complexity of the movement of a market price. And this 
study examines the hypothesis from the simulation result using the artificial 
market model. 
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14.2 The Efficient Market Hypothesis Seen from 
Complexity 

The main points of the efficient market hypothesis are summarized as follows. 

— Each market participant of a financial market takes in very quickly and 
exactly all the information related to the movement of a market price, and 
uses it for price expectation. 

— The market price that determined by the dealings between such market 
participants is reflecting properly all the relevant information that is avai- 
lable at present. 

— Therefore, there is no room for a certain person to And out the new rela- 
tion between a market price and the available information, and to become 
advantageous from other persons. That is, the movement of a market price 
becomes a random walk driven only by new information, and nobody can 
predicte it. 

When the above-mentioned main points are recaught from the viewpoint 
of complexity, the efficient market hypothesis contains the following things 
implicitly. 

— In order to take in suitable information, each market participant is going 
to complicate his prediction formula by learning, and is going to hold the 
structure of the determination formula of the market price. 

— The structure of a price determination formula is fixed and independent of 
the learning of market participants. Finally the market participants detect 
the structure, and it will disappear. 

That is, the efficient market hypothesis needs the two premises: (a) the in- 
dependence of the complexity of the movement of a market price from the 
complexity of each market participant’s prediction formula and (b) the exi- 
stence of motivation of leaning by each market participant. 

On the other hand, by the artificial market simulation, de la Maza[14.4] 
found that when the dimension of market participants’ prediction formula 
went up from 0 to 1, the movement of a market price also changes from a 
random walk to linearity. That is, he showed the possibility the complexity 
of market participants and the complexity of a market are not independent. 

Then, what is the motivation to which each market participant compli- 
cates his prediction formula? Joshi et.al.[14.5] think that it is because the 
situation similar to the prisoner’s dilemma game has occurred. In their ar- 
tificial market model, taking in the technique of the moving average of a 
technical analysis to a prediction method, and raising the dimension of a 
prediction formula from 0 to 1 corresponds to the default strategy of the pri- 
soner’s dilemma game. On the other hand, not using a technical analysis for 
prediction corresponds to the cooperation strategy. From the simulation re- 
sult, the two following conditions for becoming a prisoner’s dilemma situation 



were seen. 
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Condition 1. If one raises his prediction dimension, his prediction becomes 
more accurate and the profit of his dealings result increases. Thus, the 
motivation of the default strategy exists. 

Condition 2. However, when everybody raised the dimension, the movement 
of the market price became more complicated, and the prediction accu- 
racy has fallen rather than the time of everybody’s not using the technical 
analysis. 

Thus, since everybody raised the dimension of his prediction formula in pur- 
suit of profits, the prediction accuracy becomes worse than before. 

In the following sections, by the artificial market simulation, we analyze 
the complexity of a market and the prisoner’s dilemma situation when a 
prediction dimension becomes larger. 



14.3 Artificial Market Model 

The artificial market is a virtual financial market with 50 virtual dealers 
(agents) in a computer. One financial capital and one non-risk capital exist 
in this artificial market. Each agent expects the movement of the financial 
price, and he changes the position of the financial and non-risk capital so 
that the utility of his expected profit may become the maximum. In the 
artificial market, one term consists of four step of expectation, an order, price 
determination, and learning, and time progresses discretely by repeating these 
four steps. 



14.3.1 Expectation 

Each agent expects the change value of the financial price of this term using 
the weighted sum of the change value of past financial price. That is, in this 
study, since fundamentals information does not exist in a market, the agents 
expect the change value of the financial price only by the technical analysis. 

The expectation formula of each agent is auto the regressive integral mo- 
ving average model ARIMA(n, 1, 0), where n means the number of the terms 
of the price changes used for expectation. The larger n is, the larger the di- 
mension of an expectation formula is. Thus in this study, n is regarded as 
the complexity of each agent’s expectation. 

The expectation formula is as follows, when Pt is the financial price of 
this term which is not yet determined and j/t is the expectation the change 
of financial price {Pt — Pt-i)- 



it 



n 

kyt-i + et 

2 = 1 



x(bt -k et 



(14.1) 
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Here, et is the normal distribution whose average is 0 and standard devia- 
tion is 0.1, bt is a vector with the coefficient of the prediction formula^, 
(6i, • • • , bn)', and is a vector of the explanation variables of the prediction 
formula, i.e., the past price changes^, (j/t-i, • • • ,yt-n)' ■ 



14.3.2 Order 



It is assumed that each agent has the utility function of expected profit with 
risk avoidance. Then the optimum quantity of the position of the financial 
capital with the maximum utility, q^, is proportional to the expected change 
value yt of the formula (14.1). 

Qt = m, (14.2) 

where a is a coefficient. Each agent’s amount of orders Ot is the difference 
between the optimum position ql and the current position qt-\. 

Ot = q*t - qt-i (14.3) 

If the market price Pt is lower (higher) than his expected price {Pt-i + J/i)> 
each agent order to buy (sell). The amount of order is Ot- 



If Ot > 0 



Buy Ot {Pt < Pt-i + yt) 
No action {Pt > Pt-i + Vt) 



f No action {Pt < Pt-i + yt) 
y Sell Ot {Pt > Pt-i + yt) 



14.3.3 Price Determination 

All the orders of 50 agents in the market are accumulated, and the market 
price of this term is determined as the value where the demand and supply 
are balanced. Dealings are transacted between the buyer who gave the price 
higher than a market price, and the seller of a lower price. 



14.3.4 Learning 

Each agent updates the coefficients b* of the prediction formula (14.1) using 
the successive least-squares method with the information on the change 

^ The initial value of the coefficients bo is given with the uniform random numbers 
from -1 to 1. 

^ At the start, the initial values of price xo are generated by the normal distribution 
whose average is 0 and standard deviation is 1. 
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Vf of the newly determined market price^. The least-squares method is as 
follows[14.6]. 



bt+i = bt -k 



(X'Xt)-^x,(y,-x'bt) 

ft 



(14.4) 



where Xj is a learning matrix which starts by Xq 
matrix), and is updated by the following formula. 

(x(xo'' = (x;_ix*_i)-' 

(X(_iX*_i)-'x*x( (X(_iXt_i)-' 

ft 



100 X I (I is a unit 
(14.5) 



/t = i + x;(x;_iX*_i) 'xt 



(14.6) 



14.4 Simulation Result 

In the next section, we examine the complexity of the market and the priso- 
ner’s dilemma-situation when the prediction dimension became large using 
the artificial market model. 

14.4.1 Merit of Complicating a Prediction Formula 

We investigated the merit of complicating the prediction formula. The di- 
mensions of 25 agents’ prediction formulas was set to n, and the dimension 
of the prediction formula of the other 25 agents was n+ 1. Each simulation 
had 4000 terms which consisted of the four steps in section 14.3. The aver- 
ages of forecast errors were calculated both about the agent group with n 
dimensions and about the group of n -I- 1 dimensions. The forecast errors 
were the difference between each agent’s prediction value and a market price. 
The initial value of random numbers was changed and 100 simulations was 
carried out^. Figure 14.1 shows the difference between the forecasts errors of 
the group with n + 1 dimensions and those of the group of n dimensions. 

While the number of dimensions in the prediction formula is small, the 
merit of complicating prediction formulas is large. The agent who can predict 
correctly can increase his profit. Thus, when the number of dimensions is 
small, the conditions 1 of the prisoner’s dilemma situation in the section 14.2 
are hold. However, when the number of dimensions becomes large, the merit 
of complicating prediction formulas disappears. 

® When n = 0, the prediction value is a random number and learning is not 
performed. 

Since the calcnlation of averages were impossible when the market price had 
diverged, we carried ont simulations until we could get 100 simulations whose 
paths did not diverge. 
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Dimension of agents’ foreacast equation. 

Fig. 14.1. Comparison of forecast errors: Y-axis is a difference of forecast errors 
(forecast errors of the group of n dimensions are 100). Positive (negative) values 
mean that forecast errors of the group of n -I- 1 dimensions are small (large). 



14.4.2 The Demerit in the Whole Market 

We examined whether the prediction of prices becomes harder in the whole 
market as increase of the dimension of prediction formulas. In this simulation, 
50 prediction formulas of all agents were the same n dimension. We carried 
out the simulation with 4000 terms 100 times®. After having accumulated 
the forecasts errors in 4000 terms and taking an average of 50 agents in 100 
simulations. (Fig. 14.2). 

As a result, when the number of dimensions in the prediction formula was 
small, the forecast error became large, as the number of dimensions increased. 
That is, the conditions 2 of the prisoner’s dilemma situation in the section 

14.2 were hold. However, it has converged to the fixed value when the number 
of dimensions was lager than three. 



14.4.3 Development of the Complexity of a Market 

In order to examine the independence of the complexity of the movement of 
a market price from the complexity of each market participant’s prediction 
formula, we carried out the correlation dimension analysis®. All 50 agents 
have the prediction formulas of the same n dimension. We carried out the 
simulation with 4000 terms 100 times. Changed the embedding dimensions, 

® The path to diverge was not seen when all agents’ prediction formula was the 
same dimension. 

® The procedure of the correlation dimension analysis was described in [14.7, 14.8]. 
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Dimension of agents’ foreacast equation. 



Fig. 14.2. Forecast errors 



the correlation dimensions was calculated using the price data of 3885 terms 
at the second half while learning were stabilized to some extent (Fig. 14.3). 

As a result, when a prediction dimension was 0, the correlation dimen- 
sion curve was convex downward like the theoretical value of a random walk 
(fig. 14.3a). That is, there is no structure in the dynamics of the market 
price. However, when the prediction dimension increase a little, the correla- 
tion dimension curve was convex upward and saturated (fig. 14.3b). Thus, 
the structure that could be described by an equation of a finite dimension 
appeared in the dynamics of the market price. Furthermore, when the predic- 
tion dimension was raised, the correlation dimension curve became a straight 
line (fig. 14.3c). Thus, the correlation dimension curve was neither convex 
downward like a random walk nor saturated. That is, there was a structure 
in the dynamics of the market price, but it could not be described by an 
equation of any finite dimension. 

According to Nakajima [14.7, 14.8], as a result of analyzing Tokyo Stock 
Exchange Stock Price Index data, the logarithm of a correlation dimension 
went up linearly like this simulation result in fig. 14.3c. That is, when each 
agent’s prediction dimension increases, like the price data in the real-world, 
the dynamics of the price in the artificial market can be described roughly 
by an equation of some dimensions. And the more precise description is also 
attained by increasing the number of dimension. However, the movement 
of price data cannot be described completely by an equation of any finite 
dimensions. That is, the number of the variables related to the movement 
cannot be specified completely. 
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a) The agents’ prediction dimension is 0 




b) The agents’ prediction dimension is 1 




embedding dimension 

c) The agents’ prediction dimension is 10 




Fig. 14.3. Correlation dimensions : X-axis is the logarithm of embedding dimen- 
sions. A solid line is an average of the correlation dimension of 100 paths. A dotted 
line is the theoretical value of a random walk. 
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14.5 New Efficient Market Hypothesis 

The simulation results are summarized as follows. 

— When each market participant’s prediction dimension is 0, the movement 
of a market price resembles a random walk. If the prediction dimension 
increases, the structure that can be described by an equation of a finite 
dimension appears in the movement of price. 

— Therefore, if each agent increases his prediction dimension, since the pre- 
diction dimension approaches to the dimension of the price determination 
formula and his prediction becomes more accurate. Thus, the merit of com- 
plicating prediction formulas exists. However, if everybody increases his or 
her prediction dimension, prediction accuracy becomes smaller than before. 
That is, it will become the prisoner’s dilemma situation. 

— If everybody continues to increase the prediction dimension in the priso- 
ner’s dilemma situation, the movement of a market price come to have the 
structure that can not be described completely by an equation of any finite 
dimensions. 

The structure of the movement of a market price changed as market partici- 
pants changed their prediction formulas. That is, the complexity of market 
participants and the complexity of a market are not independent unlike the 
efficient market hypothesis. The simulation results also suggest that the struc- 
ture of the dynamics of price data did not disappear when market participants 
continue to complicate their prediction formulas. In the final state, however 
each market participant increases his prediction dimension, he cannot predict 
the market price completely. 

In such the state where there is no “correct answer” of learning, it is 
thought that a procedure of learning by each market participant becomes 
the key factor to the movement of a market price in addition to a result 
of learning. As Kichiji[14.9] said, the efficiency of learning by a market par- 
ticipant, the difference in the cognitive framework, the interaction between 
market participants, and the method of informational choice, etc. become 
important. 

Another key point is the mechanism of market price determination. In 
this study we assumed that the market price were determined discretely as 
an equilibrium price. Alternatively we can assume that the market price is 
determined continuously as transaction prices of dealings. The mechanism 
of market price determination is the mechanism how to accumulate the in- 
dividual complexity on the complexity of a market. Therefore, it has large 
influence on the relation between the complexity of market participants’ pre- 
diction formulas and the complexity of the movement of a market price. It 
is interesting to examine whether the same simulation can be acquired when 
the mechanism of market price determination changes. 
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14.6 Conclusion 

This study examined an efficient market hypothesis using artificial market 
approach. As a result, the following two points different from an efficient 
market hypothesis were found. 

— While the prediction dimension of agents is small, the structure which can 
be described to the movement of a market price exists, and the motivation 
of increasing the prediction dimension exists. 

— Even if the market participant increases the prediction dimension, the 
structure of the movement of a market price does not disappear. Finally, 
however each market participant increases his prediction dimension, he 
cannot predict the market price completely. 

As future works, we want to investigate the influence of (a) the proce- 
dure of learning by a market participant and (b) the mechanism of the price 
determination on the relation between between the complexity of market 
participants’ prediction formulas and the complexity of the movement of a 
market price. 

Acknowledgement. I want to be deeply thankful to Prof. Yoshihiro Naka- 
jima who did offer useful comments in execution of this research. 
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U-Mart is an interdisciplinary research program of agent-based artificial mar- 
ket. U-Mart proposes an open- type test bed to study trading strategies of 
agents, behavior of the market and their relationship. An experiment open 
to public (Pre U-Mart 2000) using the proposed system is held in August 
2000. More than 40 software agents (computer programs for trading) from 11 
teams participated in this experiment. This paper reports the outline of the 
experiment, the trading strategies of the participated agents and the results 
of the experiment. While Pre U-Mart 2000 treated only software agents, the 
U-Mart system is designed considering participation of the human players as 
well as the software agents. A gaming simulation by human using the U-Mart 
system held in Kyoto University is also introduced briefly. 



15.1 Introduction 

Complex behavior of market economy, typically observed in financial markets, 
is not fully explained by conventional economic theories. A new approach to 
this problem is an artificial market which enables computational experiments 
on virtual markets using agent simulation[15.1]. 

Studies on artificial markets have achieved a variety of interesting results. 
However, they also clarified the difficulties peculiar to this agent simulation 
approach, such as that: 

— researchers from different fields need to cooperate due to the interdiscipli- 
nary nature of this approach, 

— it is not easy to design a model which combines complexity (to imitate real 
markets) and simplicity (to enable computational experiments), and 

— researchers need to share common understanding on experimental configu- 
rations and results which are more complicated than theoretical models. 



T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 121-131, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 
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U-Mart^[15.2, 15.3, 15.4] is a research program to address these problems 
of artificial market studies. We have developed an artificial market simula- 
tion system, called U-Mart system, to provide a test bed for researchers from 
economics and information science to carry out experiments with common 
understanding. We are promoting diversified researches on markets by ope- 
ning this system to public. 

We have conducted an open experiment, Pre U-Mart 2000, on this system, 
inviting more than 40 software agents from public. This paper reports the 
result of the experiments, along with the strategies of the participated agents. 
The U-Mart system is designed to allow human players to participate in mar- 
ket experiments. This paper briefly introduces the human gaming simulation 
conducted at Kyoto University as well. 



15.2 Outlines of U-Mart System 

In the U-Mart system, ‘futures’ of real stock index are traded in a virtual mar- 
ket. This allows the market simulation environment to reflects the complexity 
of real markets, and at the same time, enables independent price formation. 
The U-Mart system is implemented as a client-sever system, which exchanges 
information, such as buying and selling, via the Internet using a dedicated 
protocol implemented on TCP/IP. A sever, which imitates an ‘exchange’, 
accepts orders from clients, determines prices, matches buying and selling 
orders, and manages clients’ accounts. Each client obtains the information, 
such as market performance, from the sever and places order under its own 
decision. In the U-Mart system, human agents, as well as software agents, are 
allowed to participate in market experiments. Details of the U-Mart system 
are provided in [15.4]. 



Final Settlement with 
Spot Price 





Contracted 

Price 



Fig. 15.1. U-Mart Artificial Market System 



^ originally called V-Mart 
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15.3 Outline of Open Experiment, Pre U-Mart 2000 

15.3.1 Open Experiment and Its Objectives 

We conducted an open experiment, Pre U-Mart 2000, on August 19th. 2000 
as a part of 6th. Emergent System Symposium of The Society of Instrument 
and Control Engineers in Japan. 

The objectives of this experiment are: to investigate variations of trading 
strategies and development methods for software agents, and to verify the 
actual behavior of market simulation among independently developed agents. 
Since it is the first open experiment for us, we limit the entry only to software 
agents. This is the reason that we name it “Pre U-Mart 2000”, which targets 
only a part of U-Mart conception. The participants have received an agent 
development package of U-Mart system in advance. This package contains 
templates of simple software agents and track record of J30 stock indices 
(used as spot data). 

15.3.2 Experimental System 

At the occasion of the experiment, Pre U-Mart 2000 committee set up a 
server machine, and the participants run agent programs on their note PCs 
connected to the server via Ethernet. The participants and the audience can 
watch the progress of the experiment through a video projector. 

We tested the operation of the system on the first day of the symposium 
(August 18th.), and conducted the experiment in the afternoon of August 
19th. 

15.3.3 Configuration of Experiment 

The price determination and contract algorithms are described in [15.4]. Table 
15.1 shows the parameters for the market. We use Dow Jones Industrial 
Average (scaled to J30 equivalent) to prevent participants from estimating 
the spot market data from distributed J30 data. 

The exchange (server) settles the accounts of agents at the end of one vir- 
tual day. When cash balance of an agent is less than zero after the settlement, 
the exchange automatically loan the agent up to its loan limit. The loan costs 
interest of 10% per annum and the exchange collect it at the settlement of 
the next virtual day. An agent goes into bankruptcy if the cash balance is 
still less than zero after obtaining the maximum loan, then the agent is not 
allowed to make any more deal. 



15.4 Participated Agents and Their Strategies 

Eleven teams participated in the experiment, seven from engineering and four 
from economics. Each team was assigned a quota of five agents. 




124 H. Sato et al. 



Table 15.1. Parameters of Pre U-Mart 2000 



Item 


Setting 


Underlying Indices 


Dow Jones Industrial Average, scaled to J30 equivalent 


Period 


60 virtual days 


Order Methods 


market order/limit order 


Pricing Method 


ITAYOSE* 


Pricing Interval 


15 seconds (real time) 


Number of Pricing 


4 times/virtual day 


Trade Unit 


1000-fold of contracted indices 


Bid and Asked 


indices in increments of one point 


Price Range 


no restriction 


Margin Money 


300,000 YEN/Trade Unit 


Settlement System 


mark-to-market at closing price of the day 


Membership Fee 


none 


Cash on Hand 


1 billion YEN / agent 


Loan Limit 


30 million YEN 



* A pricing method that accumulates orders for a certain period, and decides 
a price so as to achieve the maximum contracted volume for the accumulated 
orders. 



The basic strategies of participated agents are mainly based on time-series 
analysis (technical analysis) or the price difference between spot and futures 
markets^. Some agents have been manually programmed and the others use 
learning/adaptation methods such as GAs and neural networks. There are 
other interesting agents such as: the one refers to buying and selling behaviors 
of other agents, the one implements explicit risk management, and the one 
learns in real time basis. The followings describe the strategies of each team. 

1. University of Tokushima team (Engineering): - #5 

— Authors: Takao, I.Ono, N.Ono 

— Strategy: Some of their agents have learned neural networks (input: time- 
series of price differences, output: buying/selling) using GA. The other 
agents implement technical analysis methods, such as moving average, 
oscillator[15.5], and psychological line. 

2. Kyoto University team (Economics): #6 - #10 
— Authors: Koyama, Zaima, Matsui, Deguchi 

— Strategy: Some of their agents place orders based on the deviation between 
short-term and very short-term moving averages. The other agents imple- 
ment the improved version of psychological line. Contrivances have been 
made on number and amount of orders (for example, to make larger buying 
in the morning). 

3. Tokyo Institute of Technology - Fukumoto team (Engineering): #11 - #15 

^ Actual futures markets allows a strategy called “arbitrage”, which gains profit 
margin from the price difference by combining futures deals and spot deals. 
Since U-Mart only allows futures deals, the pure “arbitrage” strategy can not be 
implemented. 
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— Author: Fukumoto 

— Strategy: Their agents predict market trend with regression equation, and 
place orders based on deviation between current spot price and futures price. 
The parameters are learned with GA. They manage positions and implement 
bullish/bearish . 

4. Tokyo Institute of Technology - Yamamura Lab. team (Engineering): ^16 - 

#20 

— Authors: Yamashige, Kira, Ishii 

— Strategy: Some of their agents have learned neural networks (input: deviation 
between gradient of moving average and closing price, and deviation between 
lowest and highest prices in the past, output: expected price) using a hybrid 
algorithm. The other agents are: the one sells/buys at crests and troughs of 
price movement, and the one places orders after comparing its position with 
price difference between spot and futures markets. 

5. Univ. of Tsukuba and Yamatake Industrial team (Engineering): #21 - #25 

— Author: Murakami 

— Strategy: Their agents implement real-time learning of futures price predic- 
tion using classifier system, F-OCS. The agents have learned heavy rises and 
falls of markets and have incorporated the skills to cope with them. 

6. Osaka Pref. University team (Engineering): #26 - #30 

— Author: Mori 

— Strategy: The parasitic agents which do not use price information. They 
depend only on ordering information of other agents and place the same 
orders with majority. 

7. Osaka Sangyo Univ. team (Economics): #31 - #35 

— Authors: Taniguchi, Ozaki 

— Strategy: Some of their agents place orders according to the trend and 
against the trend. The other agents react to the gradient of price movement 
sensitively. 

8. National Defense Academy - Sato team (Engineering): #36 - #40 

— Author: Sato 

— Strategy: Their agents implement basic day-trading. They place selling order 
with few percent higher and purchase orders with few percent lower than the 
closing price of previous virtual day and aim at the profit from the difference 
between them. 

9. Kyoto Sangyo Univ. team (Economics): #41 - #45 

— Author: Nakashima 

— Strategy: Some of their agents place buying orders only or selling orders only 
base on dollar cost averaging method. The other agents place orders based 
on the ‘ren-gyo-soku’ method, a method of technical analysis. 

10. National Defense Academy - Ishinishi team (Engineering): #46 - #50 

— Author: Ishinishi 

— Strategy: Their agents place buying order when spot price is higher than 
futures price, and place selling order when spot price is lower than futures 
price. 

11. Osaka City University team (Economics): #51 - #55 

— Author: Shiozawa 

— Strategy: Basic technical analysis. 




126 H. Sato et al. 



15.5 Experimental Result 

We have conducted the experiments twice with different spot price series. 
The numbers of attended agents are 47 for the first round and 43 for the 
second round. Not every team uses its full quota of five agents. 




Virtual Day 




Virtual Day 



Fig. 15.2. Prices and Traded Volumes for 1st. Round (left) and 2nd. Round (right) 



Table 15.2. Top 10 Performance of Agents for 1st. and 2nd. Round 



1st. Round 


2nd. Round 


Agent 


Proht*" 


Team*^ 


Agent 


Profit*" 


Team*^ 


#41 


3,960,884 


9 


#12 


3,005,755 


3 


#27 


582,474 


6 


#13 


1,792,902 


3 


#26 


380,437 


6 


#18 


1,686,144 


4 


#7 


317,955 


2 


#19 


820,168 


4 


#5 


310,538 


1 


#43 


710,379 


9 


#33 


307,773 


7 


#44 


388,575 


9 


#21 


266,145 


5 


#27 


285,245 


6 


#28 


258,410 


6 


#7 


254,108 


2 


#16 


225,309 


4 


#16 


206,260 


4 


#30 


204,743 


6 


#9 


197,120 


2 



*1: price unit: 1,000 YEN, *2: team is represented by their entry number 



15.5.1 First Round 

The spot price series for the first round repeats up and down several times 
and ends at the beginning price. Figure 15.2 (left) shows the transitions of 
price and trade volume. Table 15.2 and 15.3 show the performance of each 
agent and each team at the end of the game. 

The heavy rises and falls are repeated at the beginning because of exces- 
sive limit order and market order combinations. Five agents go into bank- 
ruptcy during 11th. and 14th. virtual days. No agent goes into bankruptcy 
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Table 15.3. Performance of Teams at Pre U-Mart 2000 



Team 


1st. Round 


1 2nd. Round 


Kyoto Sangyo Univ. 


2,717,039 


-1,059,526 


Osaka Pref. Univ. 


1,512,561 


-4,309,662 


Univ. of Toknshima 


661,096 


-1,393,736 


Kyoto Univ. 


635,519 


-1,175,857 


Sato (NBA) 


622,257 


111,153 


Osaka Sangyo Univ. 


501,101 


-1,504,747 


Yamamura Lab. (TIT) 


358,853 


2,751,064 


Univ. of Tsnkuba and Yamatake 


332,358 


192,780 


Osaka City Univ. 


156,941 


-53,780 


Fukumoto (TIT) 


-232,420 


4,079,164 


Ishinishi (NBA) 


-4,711,406 


-99,237 



descending order of 1st. round profit (unit: 1,000 YEN) 



before 11th. because rises and falls do not occur at the closing price, which 
directly affect to the end of the day settlement (c.f. 2nd. round). After the 
five agents go into bankruptcy, the market calms down and the deals are 
made around the spot price. The trade volume increases at the rapid price 
movements because of the huge volume of market orders. 



15.5.2 Second Round 

The spot price series for the second round shows long-term downtrend. Figure 

15.2 (right) shows the transitions of price and trade volume. Table 15.2 and 

15.3 show the performance of each agent and each team at the end of the 
game. 

The second round shows only a few times of rapid price movements. This 
is because the price movement at the first day is too big (the futures price 
is 19,332 YEN, while the spot price is 3,178 YEN) and the market closes 
at this price. Three agents go into bankruptcy and the other agents are da- 
maged seriously as well. Consequently, the trade volume decreases after the 
second day. Two more agents go into bankruptcy on 12th. day because of the 
huge price movement at the closing. Total of five agents go into bankruptcy 
on second round. The trade volume increases at the rapid price movements 
because of the huge volume of market orders. 



15.5.3 Variety of Agents 

Eleven teams participated in these experiments and the variety of the agents 
exceeded our expectations. 

When agents show similar behavior, deals tend to fail because their deci- 
sions are similar. In such a case, to achieve deals, agents which place random 
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orders need to be introduced on the market. In our experiments, the prices 
have been formed between the varied agents without random agents. 

Although several teams use the same analysis methods (moving average 
and psychological line), the final asset of these teams differs remarkably. This 
means that these teams interpreted the indices differently in implementation 
of the methods as software agents. Technical analysis indicates “the time to 
buy (or sell)”, but it does not recommend “the amount to buy (or sell) in 
which price” . We expect that this point is clarified with larger number of 
experiments. 

It is interesting that the agents #41-45 (selling only/buying only) and 
#26-30 (do not use price data) have made good results especially on the first 
round. It does not mean that these strategies are always effective. However, 
they are obviously against the common practice that winners need to predict 
the future based on price data and to manage their position appropriately. 
Their successful performance contribute to the variety of agents. 

In the future, more agents will implement the position management (im- 
plemented only on #11-#15) or the online learning for real-time modification 
of strategy (implemented only on #21-#25). 



15.5.4 Reason of Heavy Rises and Falls 

The heavy rises and falls occur at the beginning of both rounds. At these 
experiments, we have not restricted the price range and the agents are al- 
lowed to place orders at unrealistic price. Although these unrealistic orders 
normally do not affect price determination, they may be contracted when 
huge volume of market orders are placed. In the price determination algo- 
rithm of U-Mart system, selling market orders are considered as “limit orders 
lower than the lowest limit order” and buying market orders are considered 
as “limit orders higher than the highest limit order” . This makes the price 
formation vulnerable to huge volume of market orders (See Figure 15.3). 





Fig. 15.3. Price Determi- 
nation by ITAYOSE: When 
limit orders are dominant 
(left); When market orders 
are dominant(right). 



There are two types of agents which place excessive orders. One type gives 
“very low buying limit and very high selling limit” (i.e. #38) and another 
type gives “very low selling limit and very high buying limit” (i.e. #35). 
We had assumed that they do not affect the market because the former 
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type has difficulty in making deal and the latter type goes into bankruptcy 
immediately. However, they have hazardous nature to rattle the market in 
relation with market orders. We may need to restrict the price range or to 
reconsider the price determination method. 



15.6 Experiments with Human Agents 

Heavy rises and falls have resulted at the beginning of the experiments with 
software agents. What happens if more sophisticated human agents deal in 
this virtual market? The U-Mart system can answer this question since it is 
designed to allow human agents to participate in market experiments. 

As an example of the behavior of virtual markets constructed by human 
agents, this section introduces the experiments conducted at Kyoto Univer- 
sity as a part of a lecture on gaming simulation^. 

The experiments with human agents have been conducted three times 
under the similar conditions as Pre U-Mart 2000, using different spot data 
for each time. In these experiments, small number of software agents are 
introduced on the market. They place limit orders at the prices determined 
by random numbers which comply with normal distribution around the spot 
price. 

Initially, the students made deals without strategy. It was natural because 
they were not familiar with the client software and they did not know much 
about futures markets or futures trade mechanisms. However, they started to 
understand these mechanisms by accumulating experience and became more 
strategic. 

The result of third experiment (conducted on November 16th.) is shown in 
Figure 15.4. It shows the transition of the spot price, the virtual market price 
(U-Mart Price), and the asset position of each agent. In this experiment, a 
software agent has made the best profit among one software agent and seven 
human agent (including one faculty), and three students go into bankruptcy. 

According to the students’ reports after the experiments, the bankrupt 
students predict down-trend of spot price in long-term. They focuses on buy- 
ing initially and continues selling after that, then go into bankruptcy along 
with the up-trend of spot price. On the other hand, the profited students res- 
pond to short-term price movements. They make small profits with a general 
strategy, that is to sell when price increases and to buy when price decreases. 
They maintain the stable position. 

The experimental results show remarkable differences on behavior of hu- 
man agents and the present software agents. Human agents not only make 

® “Economics System Gaming” (Dr. Deguchi) given at School of Economics, Kyoto 
University. This is a two class period on end (180 min.) biweekly lecture geared 
to undergraduate and graduate students. 
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Fig. 15.4. Experimental 
Result with Human Agents 



technical analysis of short-term price movement, but they predict long-term 
market trend and conceive a strategy based on impression. 

Although the software agent has made the best profit in this experiment, 
it highly depends on contingency in connection with the used spot data and 
the strategies of human agents. From now on, more experimental cases need 
to be accumulated to analyze U-Mart as a market and to examine differences 
between human and software agents. We will also look into the availability 
of this system as an educational tool. 



15.7 Conclusion and Acknowledgements 

In this paper, we have reported on the experiments of open-type artificial 
market, U-Mart, conducted with software agents and/or human agents. The 
results of experiments have shown the possibility to construct a variety of 
software agents and clarified the strategic differences between human and 
software agents. We will carry this study program forward by integrating 
the knowledge obtained from both type of agent simulations. It is also inte- 
resting that the results indicated the usefulness of the U-Mart system as an 
educational tool for both economics and information science. 

At the last, we are grateful to the participants of Pre U-Mart 2000 and 
everyone concerned with 6th. Emergent System Symposium. Also, we would 
like to thank Dr. Deguchi, Graduate School of Economics, Kyoto Univer- 
sity, who provides the opportunity of human agents simulation using U-Mart 
system, and the students participated in the experiments. 
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To construct agents that have trading strategies with adequate rationality 
and variety is an intrinsic requirement for artificial market study. Difference 
of preference to return and risk among agents will be one candidate reason 
of variety of the trading strategies. It can be treated as a multi-objective 
optimization problem taking both criteria as objective functions. This paper 
proposes a multi-objective genetic algorithm(MOGA) approach to construc- 
tion of trading agents for an artificial market. The U-Mart system, an artificial 
market simulator, is used for a test bed. Agents are evaluated in the U-Mart 
with other agents having simple strategies, and evolved with the MOGA. 
Gomputer simulation shows that various agents having non-dominated tra- 
ding strategies can be obtained with this approach. 



16.1 Introduction 

Recognizing complex behaviors of the prices in the real markets and limita- 
tion of conventional theories in economics, analysis of markets using agent 
based simulation, called artificial markets, attracts attention[16.1, 16.2, 16.4]. 
In some of simulation models for artificial market, rather simple agents are 
employed so as to establish clear relationship between microscopic behavior 
of the agents and macroscopic behavior of the market. On the other hand, 
some of the models use more complex agents to study adaptation, learning, 
and evolution of the agents in the market. For the artificial market study, 
it is required that the agents should have trading strategies with adequate 
rationality as a model of microscopic economic behavior on the one hand, 
and on the other hand, their strategies should have variety to form price in 
the market. To construct agent meets such requirements is, therefore, a one 
of key issues in artificial market study. 

In this paper, considering difference of preference to ‘return’ and ‘risk’ 
among agents as one of the important reasons of variety of the trading strate- 
gies, problem of designing agents is studied as a multi-objective optimization 



T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 132-141, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




16. A Multi-objective Genetic Algorithm Approach 133 



problem taking the both criteria as objective functions. A multi-objective ge- 
netic algorithm(MOGA) is taken as an approach to construction of trading 
agents. For this study, the U-Mart system, an artificial market simulator, 
developed as a common test bed in this field is used. 

This paper is organized as follows: succeeding to introduction in this sec- 
tion, the U-Mart System and the Multi-Objective Genetic Algorithms are 
briefly explained in Sections 2 and 3, respectively. In Section 4, the structures 
of the agents and implementation of the MOGA for this study is described. 
Section 5 shows the results of numerical experiments. Section 6 concludes 
this study. 



16.2 The U-Mart System 

For study of variety of trading strategies, learning and evolution of them, 
emerging behaviors of the market with them, and indirect control of the mar- 
ket through institutional design, artificial market systems with adequate com- 
plexity are required. Inspired by the RoboGup[I6.6], the ‘U-Mart’ research 
program have organized and the U-Mart system has been developed[I6.4]. 

The U-Mart system has following characteristics: 

— In the U-Mart, futures of an existing stock index is traded. Thus, com- 
plexity of the real world is introduced keeping ability of autonomous price 
forming in the artificial market. 

— The U-Mart system can be used for experiments with program agents, 
human traders, and their mixture. Thus it makes various research plans 
both in the communities of economics and computer science possible. 

— The U-Mart takes server(futures market) -client (trading agent) structure 
over TGP/IP. Gommunication between the server and client is regulated 
by a readable text-base protocol called the Simple Virtual Market Proto- 
col(SVMP). It makes development of servers and trading agents in parallel 
on various platforms, and experiments over the Internet possible. 

— The U-Mart server is implemented in Java considering experiments on 
various platforms. 

In August 2000, the first open trading contest limited to program agents 
(Pre U-Mart 2000) was held. More than 10 teams both from economics and 
computer science fields participated. This experiment shows that feasibility 
of the research program. Further, the U-Mart system has also been used for 
education both in the computer science and economics[16.5]. 



16.3 Multi-objective Genetic Algorithms (MOGA) 

Multi-objective Optimization Problem (MOP) is a problem of optimizing 
multiple objectives simultaneously. In general, there exists trade-off among 
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objectives, and therefore usually no single solution can be the optimum. As 
rational solutions of the MOP, non-dominated solution (or the Pareto optimal 
solution) is considered. The non-dominated solution is a solution that has at 
least one objective function whose objective value is better than that of any 
other solutions. Hence, the goal of the solver for the MOP is to obtain the 
set of the non-dominated solutions called ‘the Pareto optimal set’. 

Genetic Algorithms (GA) are optimization technique inspired by the na- 
tural selection theory of evolution[16.7, 16.8] . In the GAs, population of candi- 
date solutions are evolved by repetitive application of genetic operators such 
as selection/generation alternation, crossover/recombination, and mutation. 
Multi-objective Genetic Algorithms(MOGA) are GAs that try to obtain va- 
rious Pareto optimal solutions of a MOP simultaneously making use of the 
population-based search of the GAs[16.9j. 

MOGA is constructed by extending a single objective GA by introducing 

— Mechanisms of selecting non-dominated solution among population as sur- 
vivors to make population evolve closer to the Pareto optimal set. 

— Mechanisms of maintaining diversity of the population to make the popu- 
lation cover the whole Pareto optimal set well. 

Details of implementation of the MOGA in this study is discussed in the next 
section. 



16.4 Construction of Trading Agents with a MOGA 

16.4.1 Structure of Trading Agents 

The U-Mart system carries out simulation with discrete time steps t = 
1, • ■ ■ j tend- In period t, each trading agent can observe 

— S{t) = {s(I),--- , s(t— I)}: spot prices of the stock index up to the previous 
period t — 1, 

— F{t) = {/(I), • • • , futures prices in the U-Mart up to the previous 

period t — 1, 

— position{t — I): the position of the agent, 

— cash{t — I): amount of cash possessed by itself, and 

— rest{t): remaining time up to the final period. 

Observing these variables, each agent must decide his action consisting of 

— p{t): limit price of the order, 

— sb{t): type of order, i.e., sell or buy, 

— q{t): quantity of the order, 

for each period t. 

Hence, the strategy of the agent can be formalized as the following func- 
tion F: 
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{p{t),sb{t),q{t)) = 

Strategy {S (t), F{t), position{t — 1), cash{t — l),rest{t — 1)) (16-1) 

We have constructed agents having the following two structures. 

Model 1. This model is a sort of agent having strategies based on technical 
analysis, i.e., time series prediction of the prices. The agent consists of the 
following three parts: 

Risk Management Part : In the U-Mart, to maintain adequate position to 
avoid bankruptcy is a basic requirement for program agents. In this mo- 
del, the agent memorizes the maximum price change maxd for the past 
n periods. 



maxd= max |/(t) - /(r - 1)| (16.2) 

T—t-n,--- — 1 

and obtain a pessimistic estimate of its asset based on the history of past 
n periods when it keeps the current position as follows 

cash — {margin + maxd x unit) x position (16.3) 

where margin is the margin for the contracted orders deposited in the 
market unit is trading unit. If it gets negative, it means the bankruptcy. 
Hence, the maximal possible position, say position*, can be estimated as 
a solution of 

cash — {margin + maxd x unit) x position* =0. (16.4) 

Trend Prediction Part : With linear regression analysis, both the spot prices 
s{t) and the futures prices f{t) for the past n periods are fitted as linear 
functions of period t: 

s{t) = ast + bs, f{t) = Qft + bf 

Further, we assume that the futures prices f{t) can be explained by linear 
combination of them 

f{t) = y{t)s{t) + (1 - y{t))f{t) =at + b (16.5) 

where y{t) G [0, 1] is the combination weight function depending on pe- 
riod. In the beginning, the futures price / will be explained better by 
f{t) than s{t), hence small y{t) will be preferred. Closing to the end, to 
use s{t) will be better, and hence large y{t) will be preferred. 

In this model, y{t) is represented by a piece-wise linear function of period 
as shown in Fig. 16.1, and 9 control points are taken as parameters to be 
decided. 

Order Making Part : 

Order is made based on two plans. The first plan is based on the trend 
of the futures price: 
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Fig. 16.1. Piece-wise linear re- 
presentation of y{t). 



— If a > 0, take sell position. 

— If a < 0, take buy position. 

— Otherwise, take no position. 

However, its trend |a| is too strong, it may indicate some instability of 
the market. Then, to have large position will be dangerous. Considering 
this risk, the position (normalized by position') g{a, v) to be made is 
represented as a non-monotonic function of a as follows 



0 



g{a,v) = < 



2a/v + 2 
—2ajv + 2 
2a/v 

—2a(v + 2 

0 



(a/v < —1) 

(-1 < a/v < -0.5) 
(0.5 < a/v < 0) 

(0 < a/v < 0.5) 

(0.5 < a/v < 1) 

(1 < a/v) 



(16.6) 



where is a parameter. Let position obtained from Eq. (16.6) be 



poi = position* X g{a,v). 



(16.7) 



The other plan uses difference between estimated price f{t — 1) and the 
actual futures price f{t — 1). Position to be taken po 2 is given by 



po 2 = position* x d x 



f{t-l)-f{t-l) 
max,-=i,...t_i |/(r) - /(r)| 



(16.8) 



where d is a parameter. 

These two positions are combined through a weight parameter wi. Mul- 
tiplying a parameter W 2 representing ‘aggressiveness’ of the agent to it, 
we obtain the position po' to be taken as follows: 

po' = W 2 {wipOi + {1 — Wi)p 02 )- (16.9) 

The difference between po' and the current position 

q — po' — position (16.10) 



is taken as the amount of order to achieve po' . 

The limit price p is decided by extrapolating the estimated price f(t) to 
the n step future. 
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Genetic Representation : the above strategy is represented by a chromosome 
consisting of the following 14 parameters: 

— Number of steps n G [2, 60] used for linear regression analysis. 

— Nine parameters to decide function y(t). 

— Parameter v G [0, 100] used in the function g. 

— Parameter d G [—2, 2] used for deciding po 2 - 
— Weight parameters and W 2 G [0, 1]. 

Model 2. This model takes a strategy based on arbitrage. 

Risk Management Part : As same as the Model 1, maximum possible posi- 
tion, position* is calculated. 

Evaluation of Arbitrage Opportunity : This model decides position to be ta- 
ken based on the difference between the prices of the spot and the futures: 



po = position* x y{t) x 



s{t-l) - f{t- 1) 
maxr=t-Tn, - ,t-i |s(r) - /(r)] 



where 0 < y{t) < 1 is a weight function, and m is the size of the win- 
dow to evaluate the arbitrage opportunity. As same as Model 1, y{t) is 
represented by a piece-wise linear function consisting of 8 segments. 
Order Making Part : Amount of the order is decided as follows: 



q = po' — position 



The limit price is taken as same as the latest spot price. 

Genetic Representation : the above strategy is represented by a chromosome 
consisting of the following 11 parameters: 

— Number of steps n G [2, 60] used for risk management. 

— Parameter m used for assessment of arbitrage opportunity. 

— Nine parameters to decide function y(t). 



16.4.2 Implementation of MOGA 



Objective Functions. Performance of a strategy taken by an agent is mea- 
sured by 



FinalPropety — InitialProperty 

ProfitRatio = 

ImtialFroperty 



(16.11) 



As for the objective functions representing return and risk, the mean 
and the variance of the ProfitRatio in 30 simulation runs with different spot 
price series are used. The number of the simulation runs for evaluation of an 
individual is decided considering the trade-off between stability of the results 
and computation time through preliminary experiments. 
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Market Configurations. Each individual in the population is evaluated 
independently. That is, each individual is put into a separate market with 
prescribed agents, and its performance in the market is evaluated. Concerning 
composition of the market, we used two configurations: 

Configuration 1 : Market consists of the agent to be evaluated and 20 other 

agents having rather simple strategies as follows: 

Type r :5 agents that generate orders with random prices around the 
previous futures price. 

Type s : 5 agents that generate orders with random prices around the 
previous spot price. 

Type t :5 agent that buy futures if the previous price is higher than 
before, and sell otherwise following the trend of the market. 

Type a : 5 agent that buy futures if the previous price is lower than 
before, and sell otherwise. That is, they are anti-trend traders. 
Configuration 2 : The following 9 agents developed in the educational pro- 
gram held in Tokyo Institute of Technology using the U-Mart system are 

added to Configuration 1: 

Agent 1 : An agent that utilizes moving averages of the spot prices with 
large and small windows. 

Agent 2 : An agent that utilizes large and medium window moving aver- 
ages of both the spot prices and the futures prices. 

Agent 3 : An agent that utilizes moving average of the futures prices. 

Agent 4 : An agent that utilizes current futures price, the moving aver- 
ages and their variances. 

Agent 5 : An agent that utilizes the differences of the spot price and 
futures price, and variation of the futures prices. 

Agent 6 : An arbitrager that decides position based on the difference 
between the prices of spot and futures. 

Agent 7 : An agent that decides order based on the difference of the 
futures price and average price of its contracted orders. 

Agent 8 : An agent that utilizes quadratic approximation of the price 
curve and tries to capture the peak and bottom of the prices so as 
to decide its order. 

Agent 9 : An agent that makes orders using the strategy of ‘Type t’ if 
the property is larger than the initial value, and ‘Type a’ otherwise. 



Algorithm 

An algorithm of the MOGA based on the PESA[16.10] is used. Outline of 
the algorithm is as follows: 

1. Generate initial N individuals randomly and evaluate them. Let genera- 
tion counter g = 0. 

2. Increment g. If g = G, terminate the algorithm. Otherwise choose two 
parents randomly from the population. 




16. A Multi-objective Genetic Algorithm Approach 139 



3. Let the counter of generated children m = 0. 

4. Increment m. If m = M go to Step 2. 

5. Generate a child with the UNDX[16.11], and evaluate its objective values. 

6. If the child is dominated by one of the individual in the current popula- 
tion, go to Step 4. 

7. If all the individuals in the current population is non-dominated, go to 
Step. 9. 

8. Replace a dominated individual with the child, and go to Step. 4. 

9. Replace one of the two nearest individuals to the children in Euclidean 
distance with the children, and go to Step. 4. 

Considering available computation time and reliability of the solution, we 
set iV = 30, M = 1, and the maximum generation G = 10000. 

Suppression of Non-active Agents. Preliminary experiment with a single 
objective GA that considers only ‘return’ shows that 

1. Initial individual generated randomly usually yields negative returns. 

2. Evolution path of strategies shows that return of the agent is gradually 
improved keeping rather larger risk, and finally positive return is achie- 
ved. 

In the multi-objective GA, strategy of ‘do nothing’, which yields no return 
with no risk dominates most of the initial population. Hence, in runs of 
the naively implemented MOGA, we observed a tendency that population 
converges to such useless strategies. 

To avoid this phenomenon, we evaluate each strategies giving a certain 
initial position. That is, in 10 runs among 30, the agent starts trade with 
initial position of 300 unit sell, in 10 runs with 300 unit buy, and in the 
remaining with no position. The amount of initial position is decided by trial- 
and-error in the preliminary experiments. Thus, even non-active agents face 
risk due to the initial position, and therefore it has more selection pressure 
than in naive implementation. 



16.5 Results of Experiments 

Results of experiments are shown in Figs. 16.2 (a) ~ (d). These figures show 
distribution of the objective values of the agents in the market. Good soluti- 
ons have large values in the return, and small values in the risk, and therefore 
located in the right lower area of the figures. In these figures, a curve ofy = 
is also plotted. If the profit ratio follows a normal distribution, strategies un- 
der this curve yield positive return more than 84% in probability. As for the 
Fig. 16.2 (d), a curve of y = x‘^/4, that corresponding 98% positive return is 
also plotted. 

As for Model 1, the MOGA finds good solutions that dominate other 
simple strategies in Configuration 1. However, in Configuration 2, i.e., in 
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the market having more sophisticated agents, solutions found by the MOGA 
located relatively small risk area, and dominated by some of them such as 
Type s, and Agent 6, which are a sort of arbitrager. It is interesting that the 
performances on the simple agents of Type r, s, t and a change largely in 
Configuration 1 and 2. 

As for Model 2, the solution by the MOGA achieves better results. Even in 
Configuration 2, the obtained strategies dominates most of the other agents, 
and performances under the 98% curve are achieved. It shows advantage of 
the arbitrage-based strategies in the futures market. 

It should be noted that agents in the population evolve based on evalua- 
tion in the separate markets. Evolution of agents in the same market, i.e., 
co-evolution of strategies is a subject of the future study. 





Return 



Return 



(a) Model 1 in Configuration 1 
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Return 

(b) Model 1 in Configuration 2 



(c) Model 2 in Configuration 1 
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(d) Model 2 in Configuration 2 



Fig. 16.2. Results of Experiments 



16.6 Conclusion 

This paper proposes a multi-objective genetic algorithm(MOGA) approach 
to construction of various trading agents for an artificial market. That is, 
return and risk are treated objective functions for designing trading agents 
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using the U-Mart system, an artificial market simulator, as a test bed. Se- 
veral techniques are also developed to achieve efficient evolution of the agent. 
Computer simulation shows that various agents having non-dominated tra- 
ding strategies can be obtained with this approach. 
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The need for new theoretical and experimental approaches to understand dy- 
namic and heterogeneous behavior in complex economic and social systems 
is increasing. Computational simulation with dynamically interacting hete- 
rogeneous agents is expected to be able to reproduce complex phenomena in 
economics, and helps us to experiment with various controlling methods, to 
evaluate systematic designs, and to extract the fundamental elements which 
produce the interesting phenomena in depth analysis. To implement various 
applications of the agent-based simulation effectively, we have developed a 
simple framework. We also consider a new application of agent-based simu- 
lation for an environmental study and implement a preliminary simulation 
model of the international greenhouse gas (GHG) emissions trading. 



17.1 Introduction 

In real economic situations, the dynamic behavior and interactions between 
people are very complicated and may often seem irrational. Further complica- 
ting the situation, the recent progress and popularity of network communica- 
tion technologies greatly widens the diversity of participants and affects the 
market mechanism itself, and increases the dynamic fluctuations of econo- 
mic systems. In the past, traditional economic theories have only considered 
idealized representative participants in equilibrium states. It is very difficult 
to analyze dynamically changing situations involving heterogeneous subjects 
using such static and homogeneous methods. In the last decade, many rese- 
archers, including physicists and computer scientists, are starting to apply 
new approaches to investigate such complex dynamics in their studies of eco- 
nomics. One of these approaches is the agent-based simulation approach. 

The term “agent” is often used with different meanings by different resear- 
chers (see Fig. 17.1). For example, the word agent may refer to an autonomous 
graphical user interface with animation, a robot who gathers information from 
a network, an artificial lifeform, or a distributed application which collabora- 
tes with other components over the network. In economics, an agent usually 
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means an independent economic entity like a household or a firm. However, 
traditional economic theories usually consider only representative agents in 
equilibrium states. By using simulation technology, we can endow such econo- 
mic agents with heterogeneous and dynamic properties. Thus, when we refer 
to an agent-based simulation, we assume a simulation study of an economic 
system composed of heterogeneous and dynamic economic entities. 



User 

Interface 

Distributed Artificial 

inteiiigence |_jf0 

Agents 

Network 

infrastructure Robot 

Economic 

Subject 

Fig. 17.1. Various Concepts of Agents. 



Large-scale agent-based simulations have become possible only relatively 
recently, with the advent of fast, cheap, and readily available computers. The 
approach has been championed by physicists using the paradigm of com- 
putational statistical physics. De Oliveira et al. [17.1] review several papers 
from the past few years that exemplify the methodology, especially the work 
of Levy, Levy, and Solomon [17.2]. This opens the door to the study of the 
interaction of large numbers of heterogeneous, interacting agents. 

In this paper, we will introduce a simple framework for agent-based si- 
mulation and three applications: a commodities market, a dynamic online 
auction, and international greenhouse gas emissions trading. 



17.2 Agent-Based Simulation Framework: ASIA 

For effective implementations of the agent-based economic and social simu- 
lations, we developed a simple framework. Artificial Society with Interacting 
Agents (ASIA), using Java. This framework provides only very simple and 
fundamental functionality for social simulations. 

Recently, a lot of researchers have begun to investigate agent-based simu- 
lations or artificial markets. Also a number of agent systems or frameworks 
have been proposed to systematically implement models. Many of these fra- 
meworks aim at constructing unified structures with object-oriented design 
methods (For example, [17.3]) and some of them also possess an intelligent 
collaboration mechanism using the network. 

On the other hand, our framework mainly determines the dynamic inter- 
actions and trading process as foundations, and leaves the concrete design of 
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the agents’ hierarchy, social structure and individual strategy for the users. 
We believe that this difference mainly comes from differences in the agent 
concept as described in the introduction. 

We constructed our framework with a layered structure as shown in 
Fig. 17.2. The agent layer contains a basic agent class and the fundamen- 
tal environment for the agents. The environment provides the fundamental 
facilities for agents and users to create agents, to dispose of agents, and to 
send messages through a MessageManager class. 



Application Layer 
Social Layer 
Agent Layer 
Java Virtual Machine 



Fig. 17.2. Layer Structure in ASIA. 



The MessageManager collects and distributes messages sequentially with 
its own thread according to the predetermined schedule. Agents also have 
their own threads to process the distributed messages. Thus, users of the 
upper layers can construct parallel communication among agents without 
worry about the message passing mechanism. 

The social layer describes the basic role of agents in the society and gives 
the example of message exchanges for trade. We implemented Central, Par- 
ticipant, and Watcher agents and a simple market process using RFB and 
BID messages. The Central agent creates, registers and initiates Participant 
agents and Watcher agents. Users can start, stop, and reset trading through 
the GUI window provided by the Central agent. 

One sample trade procedure can be executed as follows (see Fig. 17.3). To 



(Info 




Fig. 17.3. Message Transactions in the So- 
cial Layer. 



begin a trade, the Central agent sends a Request For Bid (RFB) message to 
each Participant. Upon receiving a RFB message, a Participant agent replies 
with a BID message. The Central agent collects all of the BID messages and 
proceeds to the trade transaction if the users have customized the descendant 
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appropriately. Finally, each Watcher agent receives information about the 
trade and report it to the users in the desired format. 

The social layer only determines a formal procedure for trading and the 
users must customize the behavior of agents at the Application layer. 

In the following sections, we will give example applications using this 
framework. 



17.3 Market Simulation 

The stability of prices in asset markets is clearly a central issue in economics. 
From a systems point of view markets inevitably entail the feedback of infor- 
mation in the form of price signals, and like all feedback systems may exhibit 
unstable behavior. 

K. Steiglitz and D. Shapiro created the price oscillation and bubbles in 
a simple commodity market with producer/consumer agents and two types 
of speculators [17.4]. H. Mizuta, K. Steiglitz and E. Lirov considered the 
stability in this model with various price signals and found that the anti- 
weighted average of bid price stabilizes the market dramatically [17.5]. 

In this section, we reproduce the simulation model described in [17.5] with 
the ASIA framework. 

We use two commodities: food and gold. 

As descendant of the Central agent class, we consider a central auctioneer. 
There are three kinds of Participant agents. Regular agents produce food or 
gold and consume food; value traders and trend traders are solely speculators. 

One trading period is executed as follows. The auctioneer sends to each 
agent a Request For Bid (RFB) containing price signals. Consider first 
the case when the price signal is simply the previous closing price, as 
in [17.4, 17.6]. Based on this signal, the regular agents decide on their le- 
vels of production for that time step and speculators update their estimates. 
The agents then send bids to sell or buy. Finally, the market treats the sub- 
mitted bids as a sealed-bid double auction and determines a single price which 
maximizes the total amount of food to be exchanged. 

In each trading period the regular agents can produce either food or gold. 
They make this production decision to maximize profit, but in a shortsighted 
way, based only on the current price and their production skills. 

Fig. 17.4 shows a screen shot of the system. The PriceAmount Watcher 
window shows two graphs showing the market clearing price and the trade 
volume. 

In our previous work we showed that the price oscillation with Regular 
agents is stabilized by introducing different price signals. On the basis of the 
simulation, we also gave analytical results on the simplified dynamical system 
with different signals in [17.5]. 
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Fig. 17.4. Market Simulation with Agent 
Framework ASIA, showing a price bubble. 



17.4 Dynamic Online Auctions 

The use of online auctions is rising at a dramatic rate, and in general many 
segments of the economy are becoming granulated at a finer and finer scale. 
Thus, understanding behavior in auctions, and especially the interaction bet- 
ween the design of auctions, agent behavior, and the resulting allocations of 
goods and money has become increasingly important — first because we may 
want to design auctions that are as profitable as possible from the sellers’ 
point of view, but also because we may want to bid in auctions, or design 
computer systems that respond well to the loads that auctions generate. To 
investigate such dynamic interactions between heterogeneous bidders and the 
price formulation through successive auctions, H. Mizuta and K. Steiglitz de- 
veloped an agent-based simulation of dynamic online auctions [17.7]. In this 
section, we re-implement the auction simulation on the ASIA framework. 

The model considers a single auction involvin the sale of one item by one 
seller to one of n bidders, who submit their bids over time in the interval 
[0,T) to an auctioneer, who awards the item to the highest bidder at closing 
time. A bidder can submit more than one bid during the auction. We define 
the auctioneer as a Central agent and the bidders as Participant agents. 

The starting bid price is fixed at 1, and the duration of the auction is 500 
time units. 

At the beginning of each auction, each bidder determines his first va- 
luation of the item. At each time period 0 < t < T, each bidder receives 
the status of the auction, can update his estimation on a fixed schedule or 
probabilistically, and can submit bids if the conditions for his strategy are 
satisfied. 

We consider two different types of bidders; early bidders, who can bid 
any time during the auction period, update their valuations continuously 
and compete strongly with each other, and snipers, who wait until the last 
moments to bid. We can briefly characterize the strategy of early bidders as 
watch/modify/bid, and that of snipers as wait/bid. 

An example auction simulated by the complete system is shown in 
Fig. 17.5. 
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Fig. 17.5. Sample Auction Simulation with 
Agent Framework ASIA. 



17.5 Greenhouse Gas Emissions Trading 

In this section, we consider the application of the agent-based simulation for 
the international greenhouse gas (GHG) emissions trading under the Kyoto 
Protocol (KP). 

To prevent global warming, 160 countries agreed to the KP on limiting 
GHG emissions at GOP3 in 1997. KP sets targets for Annex I countries 
at assigned reductions below the 1990 levels, with the targets to be met 
during the commitment period 2008-2012. For example, Japan and the US 
should reduce 6% and 8% of their emissions, respectively. The KP allows 
international GHG emission trading, where countries who cannot reach the 
reduction targets can buy the emissions rights from other countries who can 
easily satisfy the target. Such a market mechanism is expected to reduce the 
worldwide cost for GHG reduction because of the large range in the marginal 
abatement cost curves (MAGs) for reducing GHG emissions. 

In the previous two sections, we have applied the simulation to relatively 
traditional market systems, that is, a commodities market and an online 
auction. Now we will investigate the anticipated properties of an emerging 
new market through a simulation study. Such a study in advance is important 
to establish efficient rules, but difficult without simulation. 

J. Griitter [17.8] developed the GERT model which calculates the equi- 
librium price with various options and parameters for MAGs. The GERT 
model treats only one trade in 2010 and each country must achieve the tar- 
gets in that year. Because this model is implemented with a spreadsheet and 
macros, it is difficult to expand the model to treat successive trades and to 
assign different strategies to different countries. 

Now we have developed a prototype for GHG emissions trading with 
the ASIA framework. Because we modeled countries as agents, we can easily 
modify the behavior of each country and investigate the dynamic interactions 
between heterogeneous strategies. 

The structure of the simulation system is as follows. The GOP agent is 
a descendant of the Gentral agent and manages the international trading. 
The Nation agents are descendants of the Participant agent and correspond 
to countries or groups. In this model, we created 12 Nations; 6 are Annex I 
countries and 6 are Non Annex I countries who are not assigned targets for 
reduction. Nations behave autonomously and independently to achieve the 
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assigned KP targets with minimum costs or to receive maximum profits from 
the trades. 

Fig. 17.6 shows the basic trading procedure through message exchanges. 
We consider both a static equilibrium market with only one trade in 2010, 
as was discussed in [17.8], and dynamic market development through the 
commitment period 2008-2012. In each trading year, a COP agent sends 
Request for Bid (RFB) messages to all Nations which have an asking price. 
Upon receiving the RFB message, a Nation agent examines the asking price 
and his MAC to decide the amount of the domestic reduction. Then he sends 
back a Bid message to the COP agent which says how much he wants to 
buy or to sell at the asked price. After repeating this RFB-BID process, the 
COP model will find the equilibrium price where the demand and the supply 
balance, and send the Trade message to approve the trades for the year. Thus, 
the equilibrium price for each year is determined when the MAC functions 
and the assigned reductions of all of the participants are given. 





Domestic Domestic 



Fig. 17.6. Trading Procedure. 



Then we considered multiple trading periods. Nation i divides up the 
assigned total reduction Ri for each trading period n = 0, 1, 2, . . . , 




As described previously, we can find the equilibrium price P* for each year 
using a partition of the assigned reduction and a MAC function at this 
time. To consider the dynamics of MAC, we introduce a technology function 
tin{p) which gives the amount of reduction using the available technology at 
a given cost p for the Nation i at the year n. Then the MAC is given as the 
inverse function of the integral of the technology function. 

For each year, all countries determine the amount of the domestic reduc- 
tion with which the values of MAC for all countries agree with one internatio- 
nal value, that is, the equilibrium price, to minimize the worldwide reduction 
cost. Similarly, they try to minimize the total cost over the commitment pe- 
riod by choosing the partition Rin (n = 0, 1, 2, . . . ) for the assigned reduction 
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which has the smallest variance in the differential coefficient of the total cost 
for each trading period. 

As a simple dynamic process for the reduction technology we ad- 

opt reusability 0 < a < 1 and deffation 0<7=1//3<1. Once the tech- 
nology whose cost is lower than the price P* is used, the reusability of the 
technology will be restricted with the coefficient a. On the other hand, tech- 
nical innovations and deffation decreases the cost of each technology. With 
Pin = max{ 7 ”Pg , . . . ,"fiP*_i}, we can obtain the technology func- 

tion as 

f. (n\= I P < Pin 

\ Pi'tio{P"p) otherwise. 

We set the initial technology function to be tio(p) with two coefficients 
and bi to reproduce the quadratic MAC function used in the CERT model, 

tio{p) = -1^==- 
^bl + Aa,p 

In our simulation, we fixed the parameters {fli}, {bi} and {i?i} for the 12 
countries as given in the CERT model and use randomly distributed {oi} and 
{/3i}. Each Nation agent i determines the initial partition of the reduction 
{Pin} and updates the partition after the commitment period so that the 
variance of the marginal reduction cost decreases. 

Fig. 17.7 shows an example of the simulation result. Users can start, stop, 
and reset the trades and select the trading duration in the upper left window 
provided by the COP agent. This main window provides information for each 
Nation’s agents, and buttons to open a GUI window for each Nation. Two 
graphs in the lower left window show the movement of the equilibrium price 
and the trading amount. There are also graphs for the marginal reduction 
cost (upper) and the partition of the assigned reduction (lower) of two Na- 
tions representing USA (left) and Japan (right). By simulating the dynamic 
adjustment of the partition, we can see the worldwide cost reduction and the 
spontaneous selection of strategies. In this particular result, USA chose the 
late action strategy and Japan chose the early action strategy according to 
their estimation of rate of the technical innovation and other circumstances. 

We can observe changes of the total reduction cost for the entire world 
and for each country with the view shown in Fig. 17.8. In the beginning of 
the simulation, all countries fix their partition as the average value through 
the trading period. 

Pin = Pi/N (for all i). 

Then they determine the equilibrium price P*, the domestic reduction 
Din{Pn)j and the trading amounts Tin{Pn) = Pin — Din{Pn) for each 
trading period. Simultaneously, they calculate the marginal reduction cost 
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Fig. 17.7. Dynamic GHG Emissions Trading over the commitment period, 2008- 
2012 . 



Pn + T’m(P*)/r* where t* = '^jtjn{Pn)- This marginal reduction cost re- 
presents the approximate effects of the partition on the total cost for each 
country. By adjusting the partition after all of the trades so that the margi- 
nal reduction costs becomes a constant value over the trading periods, each 
country expects that the total cost will be optimized. Though each country 
tries selfishly to decrease only its own cost, the total cost for the world can 
be reduced via this process as shown in Fig. 17.8. 




Fig. 17.8. Changes of the total costs via adjustment of the partition. 
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17.6 Concluding Remarks 

We have developed a dynamical simulation for the international GHG emis- 
sions trading with our agent-based simulation framework, ASIA. In a simu- 
lation study of the international emissions trading, we observed the price 
formulation for each trading year and the dynamic improvement of strategies 
which reduce the total cost. 

The implementation of various types of the agent-based simulation can 
be easily done with this framework, since it offers simple and fundamental 
facilities for agents including messaging, multi-threading, and an example of 
social negotiation transactions in separate layers. We designed the framework 
to be very simple following the well-known KISS (Keep it simple, stupid) 
principle, which enabled us to concentrate on the essential factor in the system 
and to investigate the dynamics. 

At this stage of development, we did not provide intelligence or the net- 
work functions for agents which most other frameworks require, because our 
fundamental concept of an agent does not necessarily require these facilities. 
However, we do think that a wide range of agent-based simulations can be 
constructed within this framework. However, we also consider it will be useful 
for some users if some of these options are available in the higher layer as 
components they can choose. These optional components for our framework 
remain for future work. Furthermore, much of the research and analysis re- 
quired to evaluate GHG emissions trading are also left for the future. We 
believe that this preliminary work will help in the effective construction of 
the emerging international market and that such an agent-based approach 
will have more importance in the near future. 
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This chapter shows the agent based approach to solve the tragedy of the 
common. The tragedy of the common is known to treat the problem that is 
how to manage the limited common resource. In the agent-based approach, a 
meta-agent is introduced to restrict the activity of agents by charging levies. 
It is supposed that the meta-agent and the agents don’t know the payoff 
function explicitly. Under this setting, the meta-agent try to make levy plan 
to restrict the agent activity and the agents tries to make the prediction of 
payoffs for decision making. To create the levy plan and prediction of payoffs, 
the genetic algorithms are used in each agent. Throughout the experiments, 
the formation of the levy plan and the prediction of payoffs to avoid the 
tragedy are shown. 



19.1 Introduction 

Agent based social behavior simulations are research field that treats complex 
game situations and examines artificial intelligence [19.1]. Social dilemmas are 
one of the complex game situations and suite to examine the intelligence of 
agents. In this paper, the Tragedy of the Common [19.2], which is one of the 
social dilemmas, is treated in the agent-based simulation. In this game, play- 
ers use common limited resources to get the reward. If players behave based 
on the individual rationality, all players will face to tragedies loosing higher 
payoff. To avoid such tragedies, players have to make the relationship between 
other agents to prevent the selfish behaviors or change the problem structure, 
for example, changing the payoff functions. The proposed approach is kind 
of the changing problem structure. That is, the meta-agent is introduced to 
control the levy charging to the players [19.3]. In addition, it is assumed to all 
players doesn’t know the structure of payoff function explicitly. The assump- 
tion can be thought as reflecting a part of complex real situations. Under 
this assumption, the objective of the simulation is to show the effectiven- 
ess between the coevolved levy plan of meta-agent and payoff predictions of 
agents. In the next section, the problem structure of the tragedy of common 
is introduced. Then the proposed approach is described. 
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19.2 The Tragedy of the Common 

The tragedy of the common [19.2] is famous game problem as one of the 
n-persons social dilemmas [19.4]. This game enables for us to analyze the 
behaviors of players sharing common limited resources. Owing to the common 
resources are limited, higher activity of agents to get the higher payoff will 
become to bring lower payoff. The example of the payoff function is shown 
as follows; 



Payoff i = a* (16 — + a^) — 2a* (19-1) 

where, Payoffi is payoff of agent i. a* represents the degree of activity 
of agent i. Here, 4 agents participate and 4 degrees of the activity, a* G 
{0, 1, 2, 3}, is supposed. The payoff function becomes like as Table. 19.1. 



Table 19.1. Payoff table of the Tragedy of the Common 



Total agent activity expect agent i 
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Let’s consider the game in which the player decides own activity based on 
the individual rationality. The game assumes the activity of agents consuming 
the limited common resources. Therefore, the payoff becomes will decrease 
when total activities increase. However, the Agent i will increase the activity 
against any total activity of other agents, because the agent i can increase 
own payoff until the total activity reaching 11 in the example. Namely, the 
strategy of higher activity always dominates the strategy of lower activity. 
Thus all players will decide to increase their activities based on the individual 
rationality. Thus the decisions based on the rationality will cause the limited 
common resources being exhausted and all agents will be face to the tragedy. 
In the example, the tragedy arises when total activities reached 12. 

The characteristic of the game is known that no technical solution exists. 
Therefore, to solve this game, players should change the individual rationa- 
lities to other types of rationality or problem structures should be changed 
to payoff function. One of the objectives of the agent-based simulations is 
examined what kinds of rationalities and extended problem structures can 
avoid social dilemmas like as the tragedies. In this paper, the architecture of 
the proposed agent based simulation is belonging to the extension of the pro- 
blem structure. Namely, the meta-agent is introduced to prevent the agents 
based on the individual rationality causing the tragedy. The detail of the 
proposed approach is described in next section. 
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19.3 Coevolving Levy Plan and Payoff Prediction 

19.3.1 Approach 

To solve the social dilemma, especially the game type of the tragedy of the 
common, it is proposed that coevolution between the levy plan of the meta- 
agent and the payoff prediction of agents. The approach is belonged to the 
extension of the problem structure. The charge of the levies can change the 
obtained payoff of players from the original payoff structure. Therefore even 
if the players decide their activities based on the individual rationality, suita- 
ble levy structure will prevent the activities exhausting the limited common 
resources. However, the issue of charging the levy approach is remained. That 
is how to set the suitable levy plan. The issue is connected to the planning 
policy of levy. In this approach, the individual rationality is employed for 
planning policy. Namely, the objective of the meta-agent, which will control 
the levy, is to maximize the incoming levy. While the individual rationality 
is simple and it isn’t required the meta-agent to have specific cooperative ra- 
tionality, the characteristic of the meta-agent, it is afraid to increase the levy 
selfishly. To inhibit the selfish behavior of the meta-agent, simple payment 
rules of the levy from agents to meta-agent is set. The rules are that if the 
received reward, subtracted payoff from charged levy, become negative, the 
charged agent doesn’t pay the levy to meta-agent. This simple rule and other 
related rules could be expected to inhibit the selfish behavior of meta-agent. 

Related specification of the problem, it is assumed that the meta-agent 
and the agents doesn’t know the payoff structure. The agents are required 
to decide own activities without the information of other agent’s activities. 
The assumption will reflect the real complex situations. In real complex si- 
tuations, we may not aware the similarity of the social dilemmas. Therefore, 
in the simulation model, meta-agent and the agents is expect to acquire the 
characteristic of the given dilemma structure by trial and error in the iterated 
games 

To acquire the hidden payoff structure, each individual agent tries to 
construct the prediction of payoff according to its activities. Because the pre- 
diction has to be constructed without information of other agent’s activities, 
the implicit synchronization between agents will be required. The implicit 
synchronization will be arisen from the charging levy by the meta-agent. The 
implicit synchronization of the agents means that the meta-agents can get 
stable incoming levies without the prevention of the charging rules. Therefore, 
the meta-agent also has to construct the suitable levy plan. The suitable levy 
plan means that the higher levy should be charged to a specific activity and 
lower levy should be charged to a recommended activity. According to the 
charged levy plan and their predictions, the agents will select their activity 
to maximize the rewards based on their individual rationality. Therefore, the 
meta-agent makes all agents to stably select activities related to higher payoff 
and charges to them adequate levy without loosing incomes. The relation of 
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the plan of the levy and the prediction of the payoff is expected to avoid 



the tragedy situations because meta-agent can’t get enough levies from the 
tragedy situations. 

To realize the suitable levy plan and synchronized predictions of payoff, 
the coevolution mechanism is employed as adaptation ability of agents. Each 
agent and the meta-agent have independent population of chromosomes that 
represent the plan and predictions. Based on the evaluations function which 
reflect the individual rationality, the chromosomes are applied the genetic 
operations. Throughout the experiences of the iterated games, it is expected 
that the plan of meta-agent and the predictions of the agents will be fixed to 
avoid the tragedy situation and get higher payoff and levy. 

In the following subsections, the details of the proposed methods are ex- 
plained. 

19.3.2 Relation between Levy Plan and Payoff Prediction 

The meta-agent has the levy plan for acquiring the incoming levy from the 
agents. The Levy plan consists of the expected levy values according to the 
each agent’s activities. The all agents have the payoff prediction that consists 
of the values according to their activities. Both image of the levy plan and 
the payoff prediction are illustrated in Fig. 19.1. Because the payoff function 
and other agent’s activities are hidden, the levy plan of the meta-agent and 
the payoff prediction of the agent are limited material for making decision of 
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Payoff Prediction 




Activity 
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Levy plan image of meta-agent 



Payoff prediction image of agent 





Payoff Prediction 
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Fig. 19.1. Schematic view of 
relation between levy plan of 
meta-agent and payoff predic- 
tion of agent. 
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the agents. The levy plan of the meta-agent is distributed to the agents in 
each game. The agents combined the accepted levy plan of meta-agent and 
their payoff prediction for the decision-making. 

The process of the decision-making is as follows: first, the agent combines 
the accepted levy plan and own payoff prediction. From the combined image, 
the agent, f, decides it’s own activity by probabilistic selection. Namely, pro- 
bability of the activity aj G Activity is determined from the predicted payoff 
at activity a^j subtracting the value of the levy plan, Levyj. In this probabili- 
stic selection of the activity, the negative probabilistic values are normalized 
to positive. Therefore, the activities that have higher payoff prediction and 
lower levy value in the image are relatively selected. 



19.3.3 Reward of Agent and Incoming Levy of Meta-agent 

According to the decision making of the agents, total activity of all participate 
agents are determined. From the total activities, payoff value for each agent 
can be determined. The Fig. 19.2 is shown the evaluation process of reward 
for the agent. In this figure, the agent, i, is assumed to select activity, a* = 03 , 
based on the combined image of the payoff prediction and the levy plan. If 
the relation between the total activities, a* and the activity, a* become C2 
in the figure, the reward for the agent is determined from the realized payoff 
value subtracting the levy value. In this case, the reward value becomes posi- 
tive. However, if the agent will selects the activity, a* = oi, the reward value 
becomes negative. When the reward value becomes negative, the requested 
levy value can’t pay to the meta-agent. Therefore, the reward for the agent 
is paid only if Payof f{a^ , Total) > Levy{a^) is satisfied. If the condition is 
satisfied, the reward value becomes in eq.19.2, otherwise the reward becomes 
0 . 



Payoff , Levy 




Fig. 19.2. Determina- 
tion process of reward 
for agent and incoming 
lavy for meta-agent 
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Rewardi = Payof f {ad , Total) — Levy{ad) (19-2) 

The meta-agent can recieve the incomming levy from the agent i, Levyin 
(a*), only if Payof f{a^ ,Total) > Levy{a^). Otherwise the incomming levy 
becomes 0. Namely, the meta-agent can’t recieve the incoming levy if the 
requested levy over the realized payoff value. Therefore, the values in the levy 
plan will be expected to become lower values for getting incoming levies. 



19.3.4 Evaluation of Game 

To receive enough reward values and incoming levies, suitable levy plan and 
payoff predictions must be constructed. To adjust the plan and the prediction, 
the loss values in the game are calculate as evaluation of the game. The value 
of loss for a game is determined as follows: 



Lossi = Payof fexp{od) — Rewardi (19.3) 

LosSmeta = ^(Tez)j/i„(a*) - Levy{ad)) (19.4) 

I 

where, LosSi is evaluation value of agent i with activity a*. LosSmeta is 
the summention of the losses related with the activity a* of the agent i. 

According to the received rewards, the incoming levies and the losses, the 
levy plan and the payoff predictions are adjusted in the coevolution process. 



19.3.5 Coevolution of Plan and Predictions 

The whole game process is shown in Fig. 19.3. Throughout the decision ma- 
king, the judgment and evaluation, the rewards, incoming levies, and losses 
are determined. Based on the values, the evaluations of the plan, Emeta and 
predictions, Ei, are calculated as follows: 



Rewardi 

h/i = 

LosSi 


(19.5) 


Levyinia^) 

-^TYieta — r 

LoSSmeta 


(19.6) 



Using the evaluation values, each agent and the meta-agent execute the 
operations of GA to adjust the plan and predictions. All agents have the 
population of the chromosomes as the population of GA. The chromosomes 
represent the plan and predictions. The objective of each GA is to maximize 
the evaluation value. Namely, it is that maximizing the reward without the 
loss for the agents and maximizing the incoming levy without the loss for 
meta-agent. The schematic view of the coevolution process in Fig:19.4 
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Fig. 19.3. Game process 
including decision-making, 
judgment and evaluation 
with levy plan and payoff 
prediction. 




Fig. 19.4. Coevolution bet- 
ween levy plan of meta-agent 
and predictions of agents. 
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19.4 Simulation 

To confirm the effectiveness of the proposed methods for avoiding the tra- 
gedy situation in the social dilemmas, the simulation is executed. The payoff 
function is set as follows; 



N 

Payoff, = a*(|A| x IV - ^ a^) - 2a* (19.7) 

i 

where A denotes the activity, A = 0, 1, 2, 3, 4. N is the number of agents. 
In this simulation, N is set as 4, 6 and 8 to examine the effect of the number 
of the agents. Each agent and meta-agent has 30 chromosomes. Each chro- 
mosome consists of 4 sections for each activity and levy. The length of the 
section is adjusted to represent the range of payoff function. The decoding of 
each section is summed up of I’s value. According to the decoded plan and 
the predictions, the game, the tragedy of the common, is iterated 10 times. 
The averaged evaluation values are given as fitness of the chromosomes. The 
crossover and mutation are applied the chromosomes. The crossover rate is 
1.0 and the mutation rate is 0.05. Under these parameters, coevolution of 
meta-agent and the agents are executed until 200 generations. 



19.4.1 Game without Meta-agent 

To confirm the self-interesting rationality of agents, the simulations without 
the meta-agent are executed. The number of agents is 4 and 6 in these si- 
mulations. The results are shown in Fig. 19.5 and Fig. 19.6. In both figures, 
the acquired payoff predictions have larger value according to increasing the 
activity. Thus, the agents tend to select the higher activities in the game that 
can be seen from the histograms in the figures. Namely, the agents in both 
cases fail into the tragedy situations. 



19.4.2 Simulations with Meta-agents 

To control the self-interesting agents for avoiding the tragedy situations, the 
meta-agent is introduced in the simulations. The size of agents are 4, 6t, and 
8. One of the evolution processes of the meta-agent and 4 agents is shown in 
Fig. 19. 7. From this figure, all of the agents and meta-agent can succeed to 
get enough evaluation. 

Fig. 19.8 and Fig. 19.9 represent the results of the acquired payoff predic- 
tions, the levy plan and the histogram of selecting activities in the case of 4 
agents and 6 agents. In both cases, the meta-agents set the levy plan of the 
activity 4 exceeding the payoff prediction value in the activity 4. It means 
that the meta-agents in both cases prohibit the agents from selecting the ac- 
tivity 4. The effects of the acquired levy plans can be seen in the histograms 
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Fig. 19.5a— b. Acquired payoff predictions of 4 agents without meta-agent (a) and 
histogram of selecting activities of agents (b). 



Expected Payoff Selecting Times 





Fig. 19.6a— b. Acquired payoff predictions of 6 agents without meta-agent (a) and 
histogram of selecting activities of agents (b). 
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Fig. 19.7a— b .Evolution process of meta-agent (a) and 4 agents (b) 
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of the selecting activities. The agents didn’t select the activity 4. Therefore, 
the meta-agents succeed to control the agents avoiding the tragedy situations 
in these cases. The strategy of the meta-agents based on the self-interesting 
rationality evolves to get the higher levies in stable by avoiding the tragedy 
situation. 



Expected Payoff & Levy Plan Selecting Times 





Fig. 19.8a— b. Acquired payoff predictions of 4 agents and Levy plan of meta-agent 
(a) and histogram of selecting activities of agents (b). 



Expected Payoff & Levy Plan Selecting Times 




Activity 

(a) (b) 

Fig. 19.9a— b. Acquired payoff predictions of 6 agents and Levy plan of meta-agent 
(a) and histogram of selecting activities of agents (b). 



In the above cases, the meta-agents succeed to control the activities of 
the agents. However, the situation is changed in the case of the number of 
agents becoming 8. The result of the 8 agents case is shown in Fig. 19. 10. The 
acquired levy plan prohibits selecting the activity 3 and some agents prohibit 
selecting the activity 4. Thus the almost agents can select the activity 4 and 
they sometimes close to the tragedy situation. That means, the strategy of the 
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meta-agent is changed in this case. If a fewer agents select the higher activity 
and the others select the lower activities, the agents selecting the higher 
activity will get large payoffs and the meta-agent will get higher incoming 
levy from these agents. Such situations didn’t occur in the previous cases. 
The meta-agent is aware of these situations in the evolution process. Thus 
the effective strategy of the meta-agent was changed in this case. Because 
the some agents prohibit selecting the highest activity, the complete tragedy 
situation is avoided. However, the self-interesting rationality causes to be 
close the tragedy situations in sometimes. 



Expected Payoff & Levy Plan Selecting Times 




(a) (b) 

Fig. 19.10a— b. Acquired payoff predictions of 8 agents and Levy plan of meta-agent 
(a) and histogram of selecting activities of agents (b). 



19.5 Conclusion 

In this paper, the Tragedy of the Common, which is one of the social dilem- 
mas, is treated in the agent-based simulation. In this game, the meta-agent 
prepares the levy plan base on the individual rationality. The agents make 
decisions based on the levy plan and their predictions of payoff. Through- 
out the coevolution of the plan and predictions in the simulation, the levy 
plan can prevent to select the activities of the agents toward to the tragedy 
situation in the case of the group of agents being small. However, the size 
of the agents becomes large, the strategy of the meta-agent is changed. The 
complete tragedy situation can be avoided but the agents sometimes close 
to the tragedy situations. This means it is remaining how to evaluate the 
closeness to the tragedy situation in the interaction between the meta-agent 
and agents. 
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In this paper, our purpose is to represent the establishment of the norm as 
the indirect sanction of mutual choice that individuals have the rights to 
refuse interaction. We introduce a mutual choice mechanism in the norms 
game [20.2, 20.8] instead of a direct penal regulation and then reformulate 
the norms and metanorms games with mutual choice. As a result, through 
an agent-based simulation, we confirm that the metanorm for mutual choice 
supports the establishment of the norm. 



20.1 Introduction 

The aim of the norms game [20.2, 20.8] is to investigate the emergence and 
stability of behavioral norms in the context of a game with bounded rationa- 
lity. The following definition of a norm was formulated by Axelrod: a norm 
exists in a given social setting to the extent that individuals usually act in a 
certain way and are often punished when seen not to be acting in this way. 
In the norms game, an individual player first decides whether to cooperate 
or defect. The payoff function of this alternative is similar to the N-person 
Prisoner’s Dilemma (N-PD) [20.3, 20.4]. If a player chooses to defect, some 
of the other players may observe the defection, and these observers may then 
choose to punish the defector based on the norm “punish those who defect.” 
If the defector is punished, the payoff is a very painful but the punisher has 
to pay an enforcement cost. The result of this game through an agent-based 
simulation with evolutionary approach was that the norm collapse but that, 
if the metanorm is introduced, the norm becomes established. The metanorm 
was defined as “one must punish those who do not support a norm (those 
who do not punish a defection).” 

The sanction applied in the norms game is that an individual player has 
the right to punish a defector, or in other words, to directly decrease the payoff 
of the defector. Do defectors readily agree to such enforcement of a sanction 
that punishes them and accept a decreased payoff without resistance? For ex- 
ample, a tax delinquent (defector) may not pay a penalty tax if there were no 
compulsory payment enforced by a centralized direct regulation mechanism. 
A tax delinquent may also be in arrears in his or her penalty tax. Therefore, 
a centralized direct regulation mechanism is necessary to compel a tax delin- 
quent to pay the penalty tax. If there is no compelling power, defector would 
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probably not support penal regulations against defectors. A penal regulation 
established by an individual would not be enforced. Therefore, it may seem 
strange to assume that an individual player has the right to punish a defec- 
tor by directly decreasing the payoff of the defector without the backing of a 
centralized direct regulation mechanism. 

To avoid such a difficulty, we refer to the studies on partner selection 
in multiple IPD because the concept of partner selection in Prisoner’s Di- 
lemma (PD) [20.3] can be considered a kind of sanction. In previous re- 
search, many partner selection mechanisms have been purposed: the ostra- 
cism option [20.9], the choice and refusal mechanism [20.1, 20.11], the mu- 
tual and unilateral choice [20.6], and the option of not game the playing 
[20.4, 20.6, 20.10, 20.12]. We pick up mutual choice because it does not need 
the right to directly decrease the payoff of other players. Although this me- 
chanism is used for matching two players, we apply it to an N-person game 
in the next section. Therefore, under the situation that the payoff without 
game partners is lower than all payoffs with game partners, the mutual choice 
mechanism works as an indirect sanction because, if all players only refuse 
a player, the payoff of the player can be indirectly decreased although no 
players directly decrease. 

In this research, we introduce a mutual choice mechanism into the norms 
game instead of direct penal regulation and then reformulate the norms game 
with mutual choice. Furthermore, we introduce a metanorm based on mutual 
choice. In order to examine the influence of mutual choice, we observe the 
behaviors of players through an agent-based simulation. 



20.2 Mutual Choice in Group Formation 

Although previous mutual choice [20.1, 20.7, 20.11] schemes were designed 
as matching mechanisms where two players play a PD game if both agree to 
play, we applied this mechanism to an N-person game. Here, we introduce the 
concept of “group formation [20.5],” which is the process of players choosing 
each other from within their respective groups and then interacting (playing 
N-IPD) with only members of the selected player’s group. A group is a subset 
of the overall player set, and each player can join only one group. 

The strategy of a player has two dimensions, boldness and vengefulness, in 
the same way as the original norms game. Let = {1, .., i, .., n} be the player 
set, and Boldness Bi be the strategy of player i, which represents the degree 
of boldness to defect. Vengefulness Vj represents the degree of vengefulness 
to defection associated with the other players. 

20.2.1 Norms Game with Mutual Choice 

The norm based direct sanction in the original norms game was changed to a 
norm based on mutual choice that instructs players to “refuse to interact (play 
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the N-PD game) with defectors.” Players make decisions on group formation 
in random order. There are the alternatives of group formation, forming a 
new group, or joining an existing group. The procedure used for decision 
making in group formation is as follows. 

1) At t-th iteration of group formation, the first player cannot join an exi- 
sting group but has to form a new group. The players make decisions after the 
first player chooses one group k out of the group set G = {Gi, .., G^, .., G^}, 
where Gk is the set of the players that have already made a decision on group 
formation. Player i chooses one group based on the expected cooperation with 
each other player j{G N), and this is denoted by 7Tt{i\j) [20.1, 20.11]. This 
expected cooperation is used to determine which group is most tolerable. 
Given any player i, group k is tolerable for player i in iteration t, only if 



\Gk\ - 



( 20 . 1 ) 



We define the groups satisfying condition (20.1) as “tolerable groups.” If any 
groups are tolerable to player i, then player i makes an game offer to group 
k, whose average expected cooperation for player i is highest. 

2) After the group choice of player i, the group k chosen by player i 
is given an opportunity to refuse or accept the game offer of player i. The 
players in group k decide by a majority vote whether to refuse or accept the 
game offer of player i. The player j in group k agrees to accept player i only 
if '!Tt{j\i) > Vj. If the majority of players agree to accept player i, group k 
accepts the game offer of player i and then player i joins group k. Player i is 
added to the group A: as Gfc U {i}, which is the new group k. 

3) If group k refuses player i, player i make a game offer to group I, 
whose average expected cooperation for player i is second highest. Player i 
continues making game offers until a group accepts its game offer or until all 
tolerable groups refuse its game offer. If player i is refused by all tolerable 
groups, player i forms a new group m -I- 1. A new group m + \ including 
only player i is added to group set G, and then group set G is modified as 
G={Gi,..,Gfc,..,G^,G„+i}. 

4) After decision making for group formation, players in groups of more 
than two players play N-IPD with the players in the same group. 

In the initial iteration of group formation, prior to any interaction, all 
players have the same initial expected cooperation value ttq for each player. 
Expected cooperations are updated whenever N-PDs are played. Consider 
any player in group k, if player j is not in the group k that includes player 
i in the current iteration t, the expected cooperation value of TTt{i\j) is not 
changed. On the other hand, if player j is in the group k that includes player 
i, they both play N-IPD in group k. In the N-IPD of group k, player i can 
observe the other player j’s decision and denote it as >S'(*|j). If player j 
cooperates at rate s in all iterations of N-IPD, player i denotes the decision 
making history of player j as cooperation rate S{i\j) = s (0 < s < 1). Player 
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i’s expected cooperation value for player j is updated by taking the weighted 
average over player i’s decision making history with player j, 

T^t+i{i\j) = wTTt{i\j) + (1 - w)S{i\j), (20.2) 

where the memory weight w controls the relative weighting of distance to 
recent decision making. Players can observe the decisions and update the 
expected cooperation values of only players in same group. 

20.2.2 Metanorms Game with Mutual Choice 

The metanorm we adopt is “refuse to interact (play the N-PD game) with 
those who interact (play) with defectors.” The metanorms game with mutual 
choice is based on an extension of the norms game with mutual choice. 

When player i makes a game offer to group k, the players in group k make 
decisions on whether to accept or refuse player i. If the majority of players 
agrees acceptance of player i, the players opposing acceptance of player i 
consider the players agreeing acceptance of player i as players accepting a 
defector into the group. We define in group k the players agreeing acceptance 
of player i as and the players opposing acceptance of player i as 

(-oppose ^ where Gfc = G“®”“ U and G“®”“ n = </>. The players 

opposing acceptance of player i leave group k and form the new group 
based on the metanorm. q°pp°^^ is assigned to G^+i, and then G is modified 
as G = {Gi, ..,Gfc, ..,Gm,Gm+i}- Then, player f joins group k {Gk = G^^^^U 
{t}). If group TO + I includes only player j, player j makes a game offer to 
its tolerable groups based on above-described process of group formation. 



20.3 Simulation Setup 

In this paper, because our purpose is to examine the influence of mutual 
choice on the norms game and the metanorms game, we concentrate on esta- 



Table 20.1. Common parameters in the simulations of four cases. \Ck\ represents 



the number of cooperating players in group k. 

Number of players 50 

Number of generations 10000 

Number of mutual choices per generation 200 

Number of N-PDs per mutual choice 20 

Initial expected cooperation value ttq 1.0 

Memory weight w 0.8 

Mutation rate 0.01 

Payoff function of cooperators in group k |Gfc|/|Gfc| 
Payoff function of defectors in group k 0.6 -I- |Gfc|/|Gfc| 
Payoff for lone player Paione 0.01 
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blishment and maintenance of a norm. We conducted simulations for both 
the norms game and the metanorms game under two initial conditions. The 
first condition is that a norm has already been established and that each 
player is not bold at all, that is, 14 = 1 and Bi = 0 { G N ). Under this 
condition, we examine whether it is possible to maintain a norm established 
by mutual choice. The second condition is that a norm is not established at 
all and each player is completely bold, that is, U = 0 and Bi = 1 ( Vi G Ai ). 
Under this condition, we examine whether it is possible to establish a norm 
by mutual choice. 

In our simulations, genetic algorithms are applied to evolve the player’s 
strategies. The two dimensions of a strategy, boldness Bi and vengefulness 
Vi, are each divided into 32 equal levels, from 0 to 1. Because 32 levels are 
represented by 5 binary bits, a player’s strategy needs a total of 10 bits, 5 
bits for boldness Bi and 5 bits for vengefulness Vi. Each simulation is in- 
itialized with a population of all players. A simulation consists of a sequence 
of generations inter-spaced with genetic phases. Each generation consists of 
an iteration of the norms or metanorms games with mutual choice in which 
players make, refuse and accept game offers, that is, conduct the group for- 
mation and then play N-IPD. At the beginning of the genetic phase, each 
player’s strategy in a population is assigned a fitness equal to its average 
payoff given per payoff received. A partner for crossover is selected by means 
of a roulette wheel selection. Uniform crossover is accomplished between the 
strategies of a player and a partner to obtain a new strategy for one offspring. 
After that, the strategy of this offspring is subjected to mutation, where each 
bit is flipped one bit with a certain probability. 

It would be interesting to adopt “bandwagon effect [20.5]” using group 
size to the payoff function of the N-PD game, but our purpose is to examine 
the influence of the norm and the metanorm. Therefore, we do not adopt 
bandwagon effect to simplify our model and the payoff function of the N-PD 
game in each group depends on only the ratio of cooperating and defecting 
players. The important parameters and the payoff function of the N-PD game 
are shown in Table. 20.1. 



20.4 Simulation 

20.4.1 Maintenance of Norm 

First, we will explain the maintenance of the norm in the norms and metan- 
orms games with mutual choice. The results of 10 runs are shown in Figs. 20.1 
and 20.2. The 10 circles indicate the average boldness and vengefulness of all 
players after 10000 generations. The typical dynamics of the maintenance of 
the norm in each game is shown in Figs. 20.3 and 20.4. 

In all of the runs shown in Figs. 20.1 and 20.2, there was little boldness 
and a great deal of vengefulness. The initial condition was U = 1 and Bi = 0 
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Fig. 20.1. Results of 10 runs of 
the maintenance of the norm in the 
norms game with mutual choice: aver- 
age boldness and vengefulness of all 
players in 10000 generations under the 
initial condition Vi = 1 and Bi = 0 
(Vi G N). 







Fig. 20.3. Example of the main- 
tenance of the norm in the norms 
game with mutual choice: transition of 
average boldness and vengefulness of 
all players under the initial condition 
Vi = 1 and Bi = 0 (Vi G N). 
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Fig. 20.2. Results of 10 runs of the 
maintenance of the norm in the me- 
tanorms game with mutual choice: 
average boldness and vengefulness of 
all players in 10000 generations under 
the initial condition Vi = 1 and Bi = 0 
(Vi G N). 



Boldaeas 

Oeammn 

Fig. 20.4. Example of the mainten- 
ance of the norm in the metanorms 
game with mutual choice: transition of 
average boldness and vengefulness of 
all players under the initial condition 
Vi = 1 and Bi — 0 (Vi G N). 



(Vi G N). Furthermore, in all runs the dynamics of average boldness and 
vengefulness of a population were similar to the typical dynamics shown in 
Figs. 20.3 and 20.4. Therefore mutual choice can maintain the norms in both 
the norms game and the metanorms game because little boldness and a great 
deal of vengefulness were kept throughout the generations. 

In the following explanation we represent a player with a high level of 
boldness as having Bhigh and a player with a low level of boldness as having 
Blow In the same way, we represent players as having Vhigh and Viow The 
reason for the maintenance of the norm is as follows. The mutation of player 
strategies increases boldness or decreases vengefulness because initial condi- 
tion there was little boldness and a great deal of vengefulness. The player 
with boldness increased by mutation, that is, the player with Bfiig^, does 
not join the groups and then acquires a lower payoff because other players 
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Fig. 20.5. Result of 10 runs of the 
maintenance of the norm in the norms 
game with mutual choice; average 
boldness and vengefulness of all play- 
ers in 10000 generations under the in- 
itial condition Vi — 1 and Bi = 0 
(Vi € N). 



Fig. 20.6. Result of 10 runs of the 
maintenance of the norm in the me- 
tanorms game with mutual choice: 
average boldness and vengefulness of 
all players in 10000 generations under 
the initial condition Vi = 1 and Bi = 0 
(Vi G N). 



with Vhigh refuse the game offers of this player. Consequently, the player with 
an increased boldness acquires a lower payoff. The player with Bhigh cannot 
have a freeride on the player with vengefulness deceased by mutation, that 
is, the player with Blow and Viow- The reason for this is that, if the player 
with Blow and Viow join a group, other players with Vhigh in the same group 
would refuse the game offer of the player with Bhigh- Even if the player with 
Bhigh tries to have a freeride on the players with Blow and Viow, they acquire 
lower payoffs. As a result, they are not selected in GA and perish. Although 
in this generation the number of players with Bhigh increases, in the next 
generation the players with Bhigh cannot have a freeride to acquire more 
payoffs than the players with Vhigh who cooperate with each other in the 
group. This is because the players with Blow and Viow have perished. As a 
result, the number of players with Bhigh does not increase. Therefore, in the 
norms and metanorms games with mutual choice, the norm does not collapse 
and can be maintained. 

20.4.2 Establishment of Norm 

Next, we explain the establishment of the norm in the norms and metanorms 
games with mutual choice. The results of 10 runs are shown in Figs. 20.5 
and 20.6. The 10 circles indicate the average boldness and vengefulness of all 
players after 10000 generations. The typical dynamics of the establishment 
of the norms in each game is shown in Figs. 20.7 and 20.8. 

The norms game. In nine of the runs shown in Fig. 20.5, we can observe 
there a great deal of boldness but little vengefulness. Mutation of player stra- 
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Fig. 20.7. Example of the main- 
tenance of the norm in the norms 
game with mutual choice: transition of 
average boldness and vengefulness of 
all players under the initial condition 
Vi = 1 and Bi = 0 (Vi € N). 
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Fig. 20.8. Example of the mainten- 
ance of the norm in the metanorms 
game with mutual choice: transition of 
average boldness and vengefulness of 
all players under the initial condition 
Vi = 1 and Bi = 0 (Vi G N). 



tegies decreases boldness or increases vengefulness because the initial condi- 
tion is Fi = 0 and Bi = 1 (Vi G N). 

At first, we assumed that there was only one player with Blow and Vhigh- 
A player with boldness decreased by mutation, that is, a player with Blow, 
cannot acquire a higher payoff than the players with B^igh because these 
players with Bhigh have a freeride on the players with Blow- Accordingly, 
the players with Blow do not increase in the next generation. A player with 
vengefulness increased by mutation, that is, the player with Vhigh, cannot 
acquire a higher payoff than the players with Bhigh because players with 
Vhigh do not join a group consisting of players with Bhigh- Therefore, the one 
player with Blow and Vhigh by mutation cannot acquire a higher payoff than 
the players with Bhigh and Viow Consequently, this player is not selected in 
GA and perishes. 

Next, we assumed that there were plural players with Blow and Vhigh- 
If a group consists of only players with Blow and Vhigh, the group refuses 
the game offers of players with Bhigh- If a group consists of both players 
with Blow and Vhigh and players with Blow and Viow, it is possible that a 
player with Bhigh would join this group and have a freeride. The player with 
Bhigh can join the group because while the players with Blow and Vhigh 
oppose acceptance of its game offer, the players with Blow and Viow agree 
it. If the players with Blow and Viow win the majority vote over the players 
with Blow and Vhigh, the player with Bhigh can join the group. The players 
with Blow cannot acquire higher payoffs than the free-rider. Consequently, 
they are not selected in GA and perish. Although there are plural players 
with Blow and Vhigh, the players with Bhigh and the players with Bhigh 
prevent the norm from establishing. The players with Bhigh directly prevent 
the norm’s establishment because they have a freeride on the players with 
Blow and Vhigh- The players with Blow and Viow indirectly prevent the norm’s 
establishment because they accept game offers from the players with Bhigh 
who have a freeride on the players with Blow- Therefore, in the norms game 
the norm collapses and does not become established. 
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In the remaining one run of Fig. 20.6, there was little boldness and a great 
deal of vengefulness. The reason for the failure to establish the norm was that 
the player with Bhigh can join the group consisting of both players with Blow 
and Vhigh and players with Blow and Vhigh- If there are players with Blow 
and Vhigh but no players with Blow and Viow, the player with Bhigh cannot 
join the group and then they defect from each other. As a result, the player 
with Bhigh acquires a lower payoff than the players with Blow and Vhigh 
who cooperate each other. If the number of players with Blow and Vhigh 
increases and they predominate in the population for few generations before 
the number of players with Blow and Vow increases by crossover or mutation, 
the norm becomes established. Therefore, since the simulation results (Fig. 
20.5) show that the norm was established in only one out of ten runs, it is not 
impossible but difficult to establish a norm in the norm game with mutual 
choice. 

The metanorms game. In all runs shown in Fig. 20.6, there was little 
boldness and a great deal of vengefulness. In the norms game the establish- 
ment of the norm fails because the players with Blow and Viow accept the 
game offer of the players with Bhigh- In the metanorms game, if the players 
with Blow and Vow agree to accept the game offer of a player with Bhigh and 
the group as a whole also accepts it, the players with Blow and Vhigh leave 
the group based on the metanorm; they refuse to play the N-PD game with 
those who play with defectors. The metanorm prevents the player with Bhigh 
from having a freeride on the players with Blow and Vhigh- This is because, 
if the player with Bhigh joins the group, the players with Blow and Vhigh 
leave the group. As a result, if there are some players with Blow and Vhigh, 
they can form a group without the player with Bhigh- The players with Blow 
and Vhigh can acquire higher payoffs because they cooperate with each other. 
Throughout this process, the number of players with Blow and Vhigh increa- 
ses and they predominate in the population. Therefore, the norm becomes 
established. 

In the norms or metanorms games with mutual choice, mutual choice 
can maintain the norm once the norm becomes established just as punish- 
ment does in the original games. The results of simulation also indicate the 
possibility of the maintaining the norm by mutual choice. 

In the norms game with mutual choice, the non- vengeful cooperators who 
cooperate with anyone and accept any game offers indirectly prevent from 
the establishment of the norm because the non-vengeful cooperators allow 
defectors to join the group. As a result, the norm collapses and does not be- 
come established. Therefore, it is not impossible but difficult to establish the 
norm by mutual choice the norm collapses and does not become established. 

In the metanorms game with mutual choice, although the non-vengeful 
cooperators accept the game offers of defectors and win the majority vote, the 
vengeful cooperators who play with neither the defectors nor the non-vengeful 
cooperators leave the group. Because the vengeful cooperators acquire higher 
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payoffs more stably than the non- vengeful cooperators and the defectors and 
also because the number of vengeful cooperators increases in the genetic 
phase, the norm becomes established. Therefore, the metanorm concerning 
mutual choice supports the establishment of the norm just as in the original 
metanorms game. 



20.5 Conclusion 

In this paper, rather than a direct sanction, we introduced mutual choice as 
an indirect sanction to the original norms and metanorms games. We propo- 
sed a norms game and a metanorms game with mutual choice by changing 
the original norm and metanorm based on mutual choice. In order to examine 
the influence of mutual choice, we picked up the maintenance and establish- 
ment of the norm. We conducted agent-based simulations under two initial 
conditions to study the possibility of maintaining and establishing the norm 
in the norms game and the metanorms game with mutual choice. As a result, 
we confirmed that mutual choice, as an alternative to the punishment of the 
original games, can maintain the norm once the norm becomes established. 
In the norms game with mutual choice it is not impossible but difficult to 
establish the norm by mutual choice. In the metanorms game with mutual 
choice the metanorm on mutual choice supports the establishment of the 
norm just as in the original metanorms game. 
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In this paper, we propose a method to obtain strategy coalitions, whose confi- 
dences are adjusted by genetic algorithm to improve the generalization ability, 
in the process of co-evolutionary learning with a social game called Iterated 
Prisoner’s Dilemma (IPD) game. Experimental results show that several bet- 
ter strategies can be obtained through strategy coalition, and evolutionary 
optimization of the confidence for strategies within coalition improves the 
generalization ability. 



21.1 Introduction 

Individual’s behaviors in social and economic systems are complex and often 
difficult to understand. Generally, individual’s action is motivated by certain 
stimulus, thereby the action mechanism can be a kind of dynamic system. So 
far, there has been much work on the complex phenomena which an individual 
in the dynamic systems shows from the perspective of game-theory, but it is 
difficult to deal with more realistic and complex models. Hence, we attempt 
to understand complex phenomena and systems from the view of evolution 
in the field of computer science. 

Among many economic and mathematical games. Iterated Prisoner’s Di- 
lemma (IPD) game is simple but can deal with complex problems such as 
social and economic phenomena. Axelrod studied on the strategy between 
humans using IPD game [2 1. 1]. Individuals in social and economic systems 
show adaptive behavior according to changing environment, because their 
behavior can be a kind of response to be able to adapt to the stimulus. Es- 
pecially, immune system in biological systems is representative that shows 
the stimulus-response well. The immune system can defeat external invaders 
by gating his opponents to optimal antibody among many antibodies. In the 
field of co-evolutionary learning, there are many attempts to get better stra- 
tegies by incorporating this property, and among them fitness sharing is one 
of the most well-known approaches [21.6]. 

In this paper, we propose a method to obtain better strategies to adapt 
to unknown environments, especially which can perform well against the un- 
known opponents in the IPD game. In order to deal with the problem, we 
introduce the strategy coalitions, which can be easily recognized in social 
and economic systems, and obtain them in the process of evolution of stra- 
tegies. Here, a strategy coalition consists of better strategies extracted from 
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population. Each strategy in a coalition has the confidence that identifies the 
proportion of participation in determining the next move of the coalition. In 
order for the strategies in a coalition to behave adaptively to the changing 
opponent strategies, we make the confidences for strategies to be changed 
with his opponent using another evolutionary learning. 

Section 2 introduces the IPD game and evolutionary approach to model 
the game. Section 3 illustrates the evolution of confidences and gating of 
strategies in coalition to improve the generalization ability, and experimental 
results are shown in Section 5. 



21.2 Evolutionary Approach to IPD Game 

One of the most well known games for modeling complex social, economical, 
and biological systems is the IPD game [21.2]. In the 2-player IPD game, 
each player can choose one of the two choices, defection (D) or cooperation 
(C). This game is non-zerosum and non-cooperative: One player’s gain may 
not be the same as the other player’s loss, and there is no communications 
between the two players. The game is repeated infinitely, and none of the 
players know when the game is supposed to end. 



Table 21.1. Payoff matrix of the 2IPD game. T > R > P > S, 2R > T + P 





Cooperate 


Defect 


Cooperate 


R 


S 


Defect 


T 


P 



One of the most important issues in evolving game-playing strategies is 
their representation. There are two different possible representations [21.3, 
21.7, 21.8], both of which are lookup tables that give an action for every 
possible contingency. In this paper, Axelrod [21.1] for the 2IPD game is used. 

In this scheme, each genotype is a lookup table that covers every possible 
history of the last few steps. History in such a game is represented as a binary 
string of 21 bits, where the first I bits represent the player’s own previous I 
actions (most recent to the left, oldest to the right), and the other I bits 
represent the previous actions of the other player. For example, during a 
game of 2IPD with a remembered history 2 steps, i.e., I = 2, one player 
might see this history: 



1 = 2: Example history 11 01 
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The first I bits, 11, means this player has defected (an ’1’) for both of the 
previous I = 2 steps, cooperated (0) on the most recent step, and defected 
(1) on the step before, as represented by 01. 

For the 2IPD game remembering I previous steps, there are 2^^ possible 
histories. The genotype therefore contains an action (cooperate “0,” or defect 
“1”) for each of these possible histories. Therefore, we need at least 2^* bits 
to represent a strategy. At the beginning of the game, there are no previous 
I steps of play from which to look up next action, so each genotype should 
also contain its own extra bits that define the presumed pre-game moves. The 
total genotype length is therefore 2^* -|- 21 bits. 

In the IPD game, each player can be regarded as an agent that has his 
own strategy, motivated from getting better payoff, and confidence within 
group. Agents can form a coalition as long as they can get more payoff than 
other agents or survive for long time. Properties of the agent are shown in 
Table 21.2 for the IPD game. 



Table 21.2. Agent model to play 2IPD game. 



Property 


Role 


ID 


unique identifier 


History 


keep previous moves 


Strategy 


information for next move 


BelongTo 


information of coalition 


Confidence 


proportion of participation 
in move in coalition 


Rank 


rank in coalition 



21.3 Cooperative Co-evolution of Strategies 

21.3.1 Forming Coalition 

It is very hard to find one fixed strategy that can play game adaptively 
against changing opponents in the IPD game. Several methods such as uti- 
lizing multiple better strategies such as gating have been widely used to 
improve generalization ability. Speciated strategies in the IPD game can be 
obtained by some sophisticated evolution like fitness sharing [21.6]. 

In this paper, we attempt to obtain the better strategies during the game- 
playing with the idea of coalition. In social and economic systems, individuals 
often form a strategy coalition to get better interest than other individuals 
or survive. In the IPD game, multiple strategies can form coalitions as the 
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same motivation. We can define the condition that the coalition of strategies 
can be formed with as follows. The two better strategies belongs to the same 
coalition, 

1. when the game between them brings bad payoff, or 

2. when combining them results in good payoff. 

In either cases, two strategies must be different because we do not have to 
duplicate the same strategies in a coalition. 

After that, confidence is given to each agent in proportional to his ranking. 
This confidence has an important role of determining the rate of participa- 
tion in the move of coalition. Confidence that determines the proportion of 
participating to the move of coalition is given to each agent. The next move 
of coalition is determined by the sum of these confidences of agents belongs 
to it. 



21.3.2 Evolving Strategy Coalition 

In order to evolve coalition, coalition below the average fitness of agents 
in the population should be removed and new coalition should be generated 
from crossover of coalitions in the evolutionary process. In this case, crossover 
exchanges the agents in coalition. The coalition maintains better agents and 
removes worse agents from the population. Hence, only strong agents are 
maintained in the population, and new agents are generated by mixing them 
within coalition to keep the population from being evolved by weak agents. 
Figure 21.1 shows the procedure to generate new agents using those in the 
coalition. Two agents are selected at random among agents within coalition, 
and their strategies are mixed as the same number of agents in the coalition. 




Fig. 21.1. Generation of new agents by mixing agents within coalition to prevent 
the population from being evolved by weak agents. 



21.3.3 Gating Strategies in Coalition 

Each agent has a confidence to determine the proportion to the move of co- 
alition. The coalition of fixed confidences would disappear in the course of 
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evolution, because the coalition would have the difficulty to adapt to the chan- 
ging opponents. To solve this problem, we adjust the confidences of agents 
to be able to perform well against changing opponents. 

To improve the adaptivity of coalition, techniques such as opponent mo- 
deling and gating can be used. Opponent modeling is to model and guess 
the opponent’s strategy, and then change his strategy to be optimal against 
current opponent. Since this method has difficulty to model opponent’s stra- 
tegy precisely, Darwen and Yao propose a gating method to improve the 
generalization ability [21.6]. In this method, the optimal strategy in the last 
population plays against opponent by looking for similar strategies as oppo- 
nent in the last generation of population. 

This paper uses strategy coalition that has history table for his and op- 
ponent’s moves and use the information to change confidences of strategies 
according to the change of opponent’s action. This has advantage of finding 
optimal action in the given moves kept in the history. Figure 21.2 shows the 
modified IPD game structure including the evolution of confidences. The con- 
fidences in a coalition are randomly initialized as real numbers from zero to 
two. The confidence table contains all the confidences of agents for possible 
combination of history. The training set for adjusting confidences consists 
of several well known strategies such as TFT, Trigger, CDCD, and so on 
[21.4, 21.5]. 

In the evolution, the confidences leading to good result are selected among 
population of coalitions. Crossover exchanges the confidences between coaliti- 
ons selected from the population, and mutation changes a specified confidence 
into a random real number from zero to two. 





Confidence 1 



If Ccnfidence 2 



Confidence n 



Fig. 21.2. The components of game for evolving the confidences of coalition. 
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21.4 Experimental Results 

In this paper, we have conducted two experiments in 2IPD game with the 
conventional payoff function. The first one is to obtain strategy coalition using 
co-evolutionary learning and the second one is to evolve the confidences of 
obtained coalition through another co-evolutionary learning. 

21.4.1 Evolution of Strategy Coalition 

To obtain strategy coalition we use the population size of 50, crossover rate 
of 0.6 and mutation rate of 0.001. One-point crossover with elite preserving 
strategy is also adopted. History size is 2 and maximum number of agents 
within a coalition is one third of population. The number of coalitions in the 
population is restricted under 10. 

Figure 21.3 shows the average fitness of coalitions in the evolutionary pro- 
cess. In the beginning of the evolution, average fitness of coalitions is higher 
than that of agents in the population. However, this difference decreases as 
time goes by. It does not mean that adaptivity of coalitions decreases, but 
that agents in the population do not know how to play against the opponents 
in the beginning of the game, because they are initialized at random. Howe- 
ver, as time goes by, many agents learn how to deal with opponent’s move. 
In other words, agents in the population also gradually evolve to adapt for 
their environment. 




Fig. 21.3. Average fitness of coalitions and agents in the popnlation. Solid lines 
are for coalitions and dashed lines are for agents. 
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21.4.2 Gating Strategies 

For the experiment of adjusting confidences of agents in coalition, we have 
the population size of 50 and one-point crossover rate of 0.6. Also, mutation 
rate is 0.001 and /x-A selection with elite preserving is used. History size is 
two and training set consists of well-known seven strategies and a random 
strategy. Table 21.3 explains the strategies in the training set, and the agents 
in coalition that have resulted from evolution of strategy coalition are listed 
in Table 21.4. For the test of generalization ability of evolved coalition, we 
have selected 30 agents that are top ranked in the population of 300 (as 
shown in Table 21.5), and conducted ten times runs that evolved coalition 
plays 2IPD games in round-robin. 



Table 21.3. Training set for evolving confidence of coalition. 



Strategy 


Characteristics 


TFT 


initially cooperates, 
and then follows opponent 


Trigger 


initially cooperates, but 
once opponent defects 
continuously defect 


TF2T 


similar to TFT, but defects 
for opponent’s 2 defection 


AllD 


always defects 


CDCD 


cooperates and defects 
in turn 


CCD 


cooperates two times 
and defects 


ClODAll 


cooperates before 10 rounds 
and then always defects 


Random 


moves at random 



Table 21.4. An example of agents in coalition. 



History Lookup Table 
0 1 0 0 0101110110111101 
1111 0101101010011111 
0 0 0 0 0000101010110101 
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Table 21.5. 30 opponent strategies that are extracted from the initial population 
and top ranked in the population. 



History Lookup Table History Lookup Table 



1000 0111101111101110 
0100 1101111111110011 
1000 1111111000000111 
0111 0111111110110111 
1100 0111001101010011 
0001 1011111001011100 
0001 0011010111111110 
1100 0001110110110010 
1000 1101110011011101 
0010 1101101111000110 
1111 1111101100011011 
0110 0111010011011111 
1010 1011010111111100 
1101 1101000101011110 
1100 1101011110111011 



nil 1101111001110011 
1100 0101100111010001 
0111 0011111101010010 
nil 0111011101010111 
nil 0101101100111100 
0111 1001110101010110 
1101 0001000110011010 
0010 1101110011111101 
1001 1001100101011000 
1000 1101000111101010 
0110 1001011001110110 
1011 0101111101110010 
0001 0011110111011000 
1110 0111010101110001 
1011 0101011111110100 



In the experiments, the fitness of coalition increases gradually, and the 
coalitions show the adaptive behaviors that they cooperate against the con- 
ditional cooperators such as TFT, Trigger and TF2T, and defect against 
defectors. Coalitions defeat or tie with ClODall or CDCD strategy and al- 
ways defeat the random strategy. Figures 21.4 is an example result of evolving 
confidences. 

In the test of generalization ability of strategy coalition, the confidences 
are varied with changing opponents. Experimental results indicate that ob- 
tained coalition through evolving confidences of strategies performs better 
than most of the training set, except AllD and Trigger, in the 2IPD game 
with the top-ranked 30 opponents in the initial population as shown in Table 
21 . 6 . 



21.5 Concluding Remarks 

We use the strategy coalition to obtain several better strategies in IPD game. 
Strategy coalition consists of agents and has confidence of each agent. This 
confidence has an important role in determining the next move of coalition. 
We have obtained the strategy coalition using co-evolutionary learning, and 
evolved the confidences to adapt well-known training set using genetic al- 
gorithm. In the simulation results, evolving coalitions show the adaptivity 
that they cooperate in the game with conditional cooperators such as TFT, 




21. Cooperative Co-evolution of Multi-agents 



193 




Generation 

Fig. 21.4. Average fitness of strategy coalition. 

Table 21.6. Performance against opponent strategies. 



Strategy 


Wins 


Ties 


Avg. 


0pp. Avg. 


Before 


8.64T4.9 


6±2.191.84±0.28 


1.75T0.59 


After 


18.55±0.5 


4±0.632.16±0.07 


0.92T0.29 


TFT 


8 


0 


1.70 


1.77 


Trigger 


30 


0 


2.13 


0.80 


TF2T 


7 


0 


1.54 


2.40 


AllD 


30 


0 


2.17 


0.7 


CDCD 


0 


0 


1.05 


2.75 


CCD 


0 


0 


0.91 


3.34 


ClODall 


27 


0 


1.97 


1.12 



Trigger and TF2T, but defect for AllD strategy. Besides, coalition defeats 
random strategy and defeats or ties with CDCD and ClODall strategies. In 
the test of generalization ability with the evolved coalition, we can see that 
they play better than training strategies except AllD and Trigger in the game 
with top-ranked 30 strategies of the initial population. 

Although we have used the 2-player IPD game in this paper, some of the 
results we have obtained may be applicable to more complex games. For ex- 
ample, it is interesting to investigate how coalitions could be formed among 
different countries in the world, how coalitions could be formed among diffe- 
rent parties in a country, how coalitions could be formed in the commercial 
market, etc. 
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Games 
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In this paper, we propose knowledge transaction as basic constitutes of so- 
cial interaction. Knowledge transaction among agents with heterogeneous 
knowledge are formulated as knowledge trading games. Each agent has idio- 
syncratic utility function defined over his private knowledge and common 
knowledge shared with the other agents. We consider two types of the utility 
functions, the convex and concave utility functions. The knowledge transac- 
tion are formulated as symmetric and asymmetric coordination games with 
the combination of the trading agents with those different types of the utility 
functions. Knowledge transaction in an organization are formulated as the 
continuous of heterogeneous games. We investigate what characteristics of 
an organization promote knowledge transaction or discourage sharing com- 
mon knowledge. 



22.1 Introduction 

The study of knowledge creation has begun to gain a new wave. Nonaka 
and his colleagues has developed a new theory of organizational knowledge 
creation [22.11]. They focus on both explicit knowledge and implicit know- 
ledge. The key to knowledge creation lies in the mobilization and conversion 
of tacit knowledge. They emphasize knowledge creation in two dimensions, 
epistemological and ontological knowledge creation. A spiral emerges when 
the interaction between tacit an explicit knowledge is elevated dynamically 
from a lower ontological level to higher levels. The core of their theory lies in 
describing how such a spiral emerge. They present the four modes of know- 
ledge conversion that are created when tacit and explicit knowledge interact 
with each other. The four modes, which they refer to as socialization, ex- 
ternalization, combination, and internalization, constitute the engine of the 
entire knowledge creation process. These modes are what the individual ex- 
perience. They are also the mechanisms by which individual knowledge gets 
articulated and amplified into and throughout the organization. 

The goal of our research is to formalize an economic model of knowledge 
creation by focusing the quantitative aspects of the value of knowledge. We 
classify knowledge into two kinds, one is shared knowledge, which is com- 
mon to each other. This kind of knowledge can be transmitted across agents 
explicitly. The other type of knowledge is private knowledge. It is perso- 
nal knowledge embedded in individual experience or knowledge creation. In 
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this paper, we focus on common knowledge and private knowledge as ba- 
sic building blocks in an complementary relationship. More importantly, the 
interaction between these two forms of knowledge is the key dynamics of kno- 
wledge creation in the organization of agents. Knowledge creation both at the 
individual and organizational level is a spiral process in which the above in- 
teraction takes places repeatedly as shown in Fig. 22.1. In an organization, 
the individual interacts with other members through knowledge transaction. 
Knowledge creation takes place at two levels: the individual and the organiza- 
tion, and knowledge creation consists of the forms of knowledge interaction 
and the levels of knowledge creation. 

We consider an organization of agents with heterogeneous knowledge, and 
knowledge transaction among agents constitute the basic foundation of in- 
teractions in an organization. Each member of an organization with private 
knowledge desires to accumulate both private knowledge and common know- 
ledge. Agents exchange their private knowledge and the transacted knowledge 
is shared as common knowledge, which also accelerate agents to accumulate 
their private knowledge. Both private knowledge of each agent and common 
knowledge in an organization can be accumulated through knowledge tran- 
saction. Agents benefit by exchanging their private knowledge if their utility 
will be increased. At knowledge transaction, each rational agent mutually 
exchanges his private knowledge so that his utility can be improved. Agents 
may consider sharing knowledge with others is important for cooperative and 
joint works, or they put the high value on hiding their private knowledge from 
other agents. Factors such as the value (worth) of acquiring new knowledge 
and the cost of sharing knowledge should be considered. 




Fig. 22.1. The Process 
of Knowledge Creation 
through Knowledge tran- 
saction 
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22.2 Knowledge Transaction as Knowledge Trading 
Games 

As the tasks in an organization grow in complexity, the ways must be found 
to expand existing knowledge, which increase the opportunities of accessing 
other knowledge resources [22.2] [22.3] [22.5] [22.10]. Cooperative works, if it 
is by a team of engineers, or by a group of experts, also require coordination 
by sharing common knowledge. Many functions and tasks of computers are 
also carried out through transaction among autonomous agents [22.8] [22.12]. 
These agents need to have the rights of transparent access knowledge reposi- 
tories. The knowledge repositories is the accumulated and common knowledge 
resources and that provides many users in the same organization to explore, 
to work with, and to discover. To support safe cooperation and sharing of 
knowledge, while preserving agents’ autonomy, agents should negotiate with 
each other on the access rights and deletion policies on knowledge or when 
necessary the rights are propagated. 

In this section, we formulate knowledge transaction as noncooperative 
games. We consider an organization of agents G = {At : 1 < i < N} with 
both private knowledge and common knowledge. They transact their valua- 
ble private knowledge with other agents, and the transacted knowledge can 
be shared as common knowledge. Agents may benefit by exchanging their 
private knowledge if their utility will be increased. Therefore in knowledge 
transaction, agents mutually trade their private knowledge if and only if their 
utilities can be improved. 

Each agent € G has the following two trading strategies: 

51 : Trades a piece of his private knowledge 

5 2 : Does not trade (22.1) 

We need to investigate the inductive reasoning process where each agent has 
different value judgments on trading. Factors such as the value (worth) of 
knowledge possessed by each agent, the loss for disclosing the knowledge to 
others should be considered. The associated payoffs of agent Ai when he 
trades a piece of knowledge are shown as the payoff matrix in Table 22.1. 
Depending on the payoffs, we can obtain the following four types of the 
optimal transaction rules agent Ai & G 

(Case 1) Ul > Uf, Uf > Uf (22.2) 

In this case, the strategy dominates the other strategy. The optimal 

strategy is then to transact his private knowledge without regarding the stra- 
tegy of his trading partner. 

(Case 2) [// < Uf, Uf < Uf (22.3) 

In this case, the strategy S 2 dominates the other strategy. The optimal 

strategy is to not to transact without regarding the strategy of his partner. 




198 



K. Sato and A. Namatame 



Table 22.1. The payoff matrix of agent Ai 





Trading partne | 


Si (transact) 


S 2 (not transact) 


Agent Ai 


Si (transact) 


ut 


Ut 


S 2 (not transact) 


Uf 


Ut 



(Case 3) C// > C/f , < Uf (22.4) 

In this case, the optimal strategy is determined based on the strategy of 
his partner. If his partner transacts, the optimal strategy become to transacts, 
and he does not transact, the optimal strategy is not to transact. 

(Cased) Ul<U!, > Uf (22.5) 

In this case, the optimal strategy also depends on the other agent. Ho- 
wever, if he does transact, the optimal strategy is not to transact, and if he 
does not transact, the optimal strategy is to transact. 

In Case 3 and 4, the optimal strategy is obtained as the function of the 
strategy of his trading partner as follows: Let denote the possibility of the 
trading partner is given by p. Then the expected utility of agent Ai when he 
chooses Si or S '2 is given as follows: 

TUSi)=pUl + {l-p)U^ 

U,{S2)=pUf + {l-p)Uf (22.6) 

Then, agent will transact if the following inequality is satisfied: 

pUl + {l-p)Uf>pUf + {l-p)Ut t = A,B, (22.7) 

By aggregating the payoffs in Table 1, we define the following parameter 
termed as threshold associated to each agent Ai G G . 

0, = {Ut - Uf)/{Ul + Ut-Uf- Uf) (22.8) 

Then from the inequality in (22.7), Agent Ai will transacts his knowledge 
depending on the following two cases: 

(1) When Ul + Uf — Uf — Uf > 0 agentA^ transacts if p > 9i (22.9a) 

(2) When [// -|- [// — C/f — Uf < 0 agentA^ transacts if p < 6i (22.9b) 



22.3 Knowledge Trading as Symmetric and Asymmetric 
Coordination Games 

In this section, we show knowledge transaction can be formulated as sym- 
metric or asymmetric coordination games, depending on the types of the 




22. Social Interaction as Knowledge Trading Games 199 



utility functions of the two agents. In symmetric coordination games, both 
agents gain benefit if they select the same strategy, on the other hand, they 
are better of if they choose different strategies in asymmetric coordination 
games. 

We define the utility function of each agent as the function both his private 
knowledge and the common knowledge. The utility function of agent Ai is 
defined as the semi-liner function both his private knowledge f2i and the 
common knowledge K, such as; 



U,(f2„K) = f2, + v^(K), i = A,B, (22.10) 

The value X — Vi{X) represents the relative value of agent Ai when he holds 
knowledge X as private knowledge or the common knowledge. If X — Vi{X) > 
0 , he puts a higher value on knowledge X as private knowledge. If Vi{X) — 
X > 0 , he puts a higher value on knowledge X as the common knowledge. 
We also consider the following three types of the value functions: 

Definition: For a pair of knowledge X and Y, {X yf Y) 

(1) Vi{X \J Y) = Vi{X) + Vi{Y) , and the value function Vi{X) is linear. 

(2) Vi(X V y) > Vi{X) + Vi{Y) , and the value function Vi{X) is convex. 

(3) Vi{X V y) < Vi{X) +Vi{Y) , and the value function Vi{X) is concave. 

If the value function is convex, acquiring common knowledge satisfies the 

increasing returns. Increased common knowledge brings additional values: 
acquring more common knowledge means gaining more experinces of other 
agents and achieving greater understanding of how to achieve the common 
tasks. On the other hand, if the value function is concave, acquiring common 
knowledge satisfies the decreasing returns. 

We now consider a knowledge transaction between agent A with his pri- 
vate knowledge X and B with his private knowledge Y. The associated payoffs 
of both agents in Table 1 are given as follows : 

Ua{Si,S,) = Qa-X + va(X \JY) = U\ 

Ua{Si,S 2) = nA-X + va{X) = u\ 

Ua{S2,Si) = Ha + va{Y) = U\ Ua{S2,S2) = Qa = U\ (22.11) 

Ub{Si,Si) = Qb-Y + vb{X VY) = Uh 

Ub{S2, Si) = [2b-Y + vb{Y) = Ul 

Ub{SuS2) = + vb{X) = U% Ub{S2,S2) = Qb = U% (22.12) 

The above associated payoffs can be interpreted as follows: Once they 
decide to transact their private knowledge, it is disclosed to the other agent, 
and it becomes as common knowledge. When both agents decide to trade 
their private knowledge, the payoffs of both agents are defined as their values 
of common knowledge minus their values of private knowledge. If agent A 
does not transact, and agent B transacts, he receives some gain by knowing 
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new knowledge Y. If agent A trades knowledge X and agent B does not trade, 
his private knowledge X becomes as common knowledge, and some value is 
lost. If both agents do not transact, they receive nothing. Knowledge trading 
have unique features which are not found in the commodity trading. With the 
knowledge trading, agents do not lose all the value of their traded knowledge. 
They also receive some gain even if they do not trade if the partner trades. 
Subtracting [/? from [// , and Uf from Uf we define the following parameters: 

aA = U\-U\ = -X + va{X V Y) - va{Y) 

fiA = U\-U\ = X-VA{X) (22.13) 

aB = Ul-U% = -Y + vb{X V Y) - vb{X) 

(iB = U%-Ul=Y -vb{Y) (22.14) 

Aggregating the payoffs, we define the following parameters which represent 
the values of integrating two independent knowledge Aand Y. 

= va{X V Y) - va{X) - va{Y) 

aB+PB= vb{X V Y) - vb{X) - vb{Y) (22.15) 

The parameter [3i, i = A, B, represents the difference of the values when they 
are private knowledge and common knowledge. If f3i > 0, i = A, B, some 
value of knowledge is lost if it changes from private to common knowledge. If 
Pi < 0, i = A, B, the value of knowledge increases if it is treated as common 
knowledge. The parameter Oj + Pi, i = A,B, represents the multiplier effect 
of knowledge X and Y. If the value functions Vi{K),i = A,B, are convex, 
ai + Pi, i = A, B, are positive, and if they are concave functions, they are 
negative. 

Depending the signs of the parameters ai. Pi, i = A,B, the knowledge 
trading games can be classified into the following two types: 

(1) Symmetric Coordination Games: The value functions Vi{K),i = A, B, 
are convex 

If the value functions Vi{K),i = A, B, are convex, then we have ai+Pi > 0 
,i = A, B. If both agents have the convex value functions, their value func- 
tions defined for common knowledge become to be the increasing return of 
the scale. In this case, the payoff matrix in Table 22.1, which satisfies the 
condition of (22.9a), can be transformed the payoff matrix in Table 22.2, 
which is known as a symmetric coordination game. The coordination game 
with the payoff matrix of Table 22.2 has two equilibria of the pairs of the 
pure strategies (S'! ,S'i), (S '2 , 82 ), and one equilibrium of the mixed strategy 
[22.7] [22.8]. Absent an explanation of how agents coordinate their expecta- 
tions on the multiple equilibrium, they are faced with the possibility that 
one agent expects one equilibrium and the other agent expects the other, 
and in this case, the coordination failure may occur by selecting the different 
strategy. 
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Table 22.2. The transformed payoff matrix of the knowledge trading 

(i) If Oi + /3i > 0, i = A, B, the value functions are convex, (ii)If ai -\- Pi < 0, 

i = A, B,they are concave 





agentB 


Si (trade) 


S 2 (no trade) 


agebtA 


Si (trade) 


OLB 

OtA 


0 

0 


S 2 (no trade) 


0 

0 


Pb 

Pa 



(2) Asymmetric Coordination Games: The value function = A, B, 

are concave 

If the value functions Vi{K),i = A, B, are concave, then we have ai + Pi < 
0, z = A, B. If both agents have the concave value functions, their value 
functions defined over the common knowledge become to be the decreasing 
return to the scale. In this case, the payoff matrix in Table 22.2 satisfies the 
condition of (22.9b), which is known as a asymmetric coordination game. 
The asymmetric coordination game has two equilibria of the pairs of the 
strategies {Si , 82 ), { 82 , 81 ), and one equilibrium of the mixed strategy. Absent 
an explanation of how agents coordinate their expectations on the multiple 
equilibrium, they are faced with the possibility that one agent expects one 
equilibrium and the other agent expects the other, and in this case, another 
type of coordination failure may occur by selecting the same strategy. 



22.4 Aggregation of Heterogeneous Payoff Matrices 

In this section, we consider the knowledge transaction in an organization of 
agents G = {Ai : 1 < i < N} . Each agent Ai has knowledge Xi to be tran- 
sacted. The payoff matrix of each agent also depends on the knowledge to be 
transacted. In trading games where there are many agents with heterogeneous 
knowledge, it is possible to reason about others only in the average. Therefore 
we assume that each agent reasons the other agents have the knowledge of 
the same value. 

Then each agent has the payoff matrix in Table 22.2 reflecting his jud- 
gement on the knowledge trading. We introduce the following parameter, 
defined as threshold of agent Ai : 

Oi = Pi/{a, + Pi) = {Xi - vPXi)}/{vPX, V r) - Vi{Xi) - v^Y)} 

(22.16) 

where Y represents knowledge held by the trading partner of agentA^ . The 
denominator of threshold in (22.16) represents the multiplier effect of sharing 
knowledge, and the numerator represent the cost of the trading. 
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From the analysis of the previous section, we can classify the knowledge 
trading games into the following two types. 

(1) The value function Vi{K) is convex. 

In this case, agent Ai plays the symmetric coordination games. Let sup- 
pose the proportion of agents in G who choose the strategy S\ is given by 
p(0 < p < 1). From (22.9a) we have the following optimal transaction rule of 
agent Ai, which is the function of his threshold 9i . 

{i) : He should transact if p > 9i 

(ii) : He should not transact ii p < 6i (22.17) 

(2) The value function Vi{K) is concave. 

In this case, agent Ai plays the asymmetric coordination games. From 
(22.9b) we have the following optimal transaction rule of agent which the 
function of his threshold 9i : 

(i) : He should transact if p < 9i 

(ii) : He should not transact if if p > 9i (22.18) 

Then, we can classify agents with convex value function into the following 
three types depending on his threshold 9i : 

(a.) 9i 0 (ai Pi) : Hard-core of trading 

From the optimal transaction rule in (22.17) or (122.18), an agent with 
low threshold has the strategy S'! as a dominant strategy. He is willing to 
disclose his private knowledge without regarding the other agent’s strategy. 
Therefore, we define an agent with low thresholds are a hard-core of trading. 

(b) « 1 (Pi ap : Hard-core of no trading 

An agent with high threshold has the strategy S 2 as a dominant strategy. 
He does not trade his knowledge without regarding the other agent’s strategy. 
We define an agent with high threshold is a hard-core of no trading. 

(c) 0 < < 1 : Opportunist 

In this case, the optimal strategy depends on his partner’s strategy. The- 
refore we define this type of an agent as an opportunist. 

Each agent has idiosyncratic payoff matrix reflecting his own value judge- 
ments for knowledge trading. The payoff matrix of Table 22.2 is characterized 
by threshold defined in (22.16). Therefore, we aggregate of the heterogeneous 
payoff matrices, one for each member of the organization, and represent as 
the distribution of threshold. As examples, we consider several threshold dis- 
tributions in Fig. 22. 2. An organization with the threshold distribution in 
Fig. 22. 2(a) consists of many hard-core of trading with low thresholds. An 
organization with the threshold distribution in Fig. 22. 2(b) consists of many 
hard-core of no trading with high thresholds. An organization with the thres- 
hold distribution in Fig. 22. 2(c) consists of opportunists with intermediate 
thresholds. An organization with the threshold distribution in Fig.22.2(d) 
consists of both hard-core of trading and hard-core of no trading. 
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Fig. 22.2a-d •Distribution Functions of Threshold in an Organization (a) An or- 
ganization of hard-core of trading (b) An organization of hard-core of no trading 
(c) An organization of opportunistic agents (d) An organization of both hard-core 
of trading and hard-core of no trading 



22.5 The Collective Behavior in Knowledge Transaction 

In this section, we investigate the long-run collective transaction in an orga- 
nization. We provide the evolutionary explanations of studying the collective 
behaviors motivated by the works in evolutionary games [22.4] [22.9] [22.13]. 
At any given moment, a small fraction of the organization is exogeneously 
given opportunities to observe the exact distribution in the organization, and 
take the best response against it. 




Rational agent 



Fig. 22.3. Emergent Collective Behavior in Knowledge transaction 



The heterogeneity of the organization G can be represented as the distri- 
bution function of their threshold. We denote the number of agents with the 
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same threshold 6 by n{0) in G , which is approximated by the continuous fun- 
ction f{9), defined as the density function of threshold of G . The proportion 
of agents whose threshold are less than 9 is then given by 

F{9)= [ f{X)dX (22.19) 

J\<e 

which is defined as the accumulative distribution function of threshold in G. 
We characterize the collective behaviors classify into the following two types. 




Fig. 22.4a-d •The dynamic process of the collective knowledge transaction (a) An 
organization of hard-core of trading (b) An organization of hard-core of no trading 
(c) An organization of opportunistic agents (d) An organization of both hard-core 
of trading and hard-core of no trading 



(1) An organization of agents with convex value functions 
In this case, each pair of agents play the symmetric coordination games. 
We denote the proportion of the trading by the t-th transaction by p{t). Since 
the optimal transaction rule of an agent with the convex value function is 
given in (22.17), agents with the threshold satisfying p{t) > 9i trade at the 
next time period. The proportion of agents who trade at the next time period 
t-|- 1 is then given by F{p(t)) . Therefore the proportion of agents who traded 
can be described by the following dynamics: 

p{t + l)=F{p{t)) (22.20) 

The dynamics is an equilibrium at 

p* = F{p*) (22.21) 

As specific examples, we consider the knowledge transaction in the organiza- 
tion G with the threshold distribution functions in Fig.22.2. 

(Case 1-1) The distribution function of threshold is given in Fig. 22. 2(a). 
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The dynamics of the knowledge transaction in this case is shown in 
Fig. 22. 4(a). The dynamics has the unique sg equilibrium p = 1, where all 
agents transact. 

(Case 1-2) The distribution function of threshold is given in Fig. 22. 2(b) 
The dynamics of the knowledge transaction in this case is shown in 
Fig. 22. 4(b). The dynamics has the unique stable equilibrium p = 0, where no 
agent transacts . 

(Case 1-3) The distribution function of threshold is given in Fig. 22. 2(c) 
The dynamics of the knowledge transaction in this case is shown in 
Fig. 22. 4(c). The dynamics has the two stable equilibria p = 0 and p = 1. 
If the initial proportion who transact p(0) is greater than 0.5, then the dyna- 
mics converges to p = 1, on the other hand, if it is less than 0.5, it converges 
to p = 0. 

(Case 1-4) The distribution function of threshold is given in Fig. 22. 2(d) 
The dynamics of the knowledge transaction in this case is shown in 
Fig. 22. 4(d). The dynamics has the unique stable equilibrium p = 0.5, where 
a half of the agents transact their knowledge. 

(2) An organization of agents with concave value functions 
In this case, each pair of agents play the asymmetric coordination games. 
Let denote the proportion of the agents who transact at the t-th transaction 
by p{t) . Since the optimal transaction rule of an agent with the concave value 
function is given in (22.18), agents with the threshold satisfying p(t) < 9 . 
Agents with thresholds greater than p(t), which is given by 1 — F{p{t)) will 
be transacted at the next transaction t -|- 1. Then, the proportion of agents 
who transact at the next time period is given by the following dynamics: 



p{t + l) = l- F{p{t)) 



( 22 . 22 ) 



(Case 2-1) The distribution function of threshold is given in Fig. 22. 2(a) 

(Case 2-2) The distribution function of threshold is given in Fig. 22. 2(b) 

(Case 2-3) The distribution function of threshold is given in Fig. 22. 2(c) 




Fig. 22.5a— b. The dynamics knowledge transaction process with cycles (a), and 
with convergence (b) 



With the above three cases. There is no equilibrium, and starting from any 
initial proportion p(0), the dynamics cycles between the two external points 
El : p = 0 and : p = 1. Once it reaches to one of these extreme points. 
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it visits each of them alternatively. With this cycles occurs, we have two 
situations, where all agents trade and no agent trades at the next time, and 
they repeat this cycle for ever. This phenomenon is known as a coordination 
failure. 

(Case 2-4) The distribution function of threshold is given in Fig. 22. 2(d) 
The dynamics has the unique stable equilibrium p = 0.5, where a half 
of the agents transact their knowledge, which has the same property with 
symmetric coordination games as shown in Fig. 22. 5(b). 



22.6 Conclusion 

The aim of this paper was to formalize an economic model of knowledge 
creation by focusing the quantitative aspects of the value of knowledge. We 
classified knowledge into two kinds, shared knowledge and private knowledge. 
We focused on common knowledge and private knowledge as basic building 
blocks in an complementary relationship. The knowledge transaction were for- 
mulated as non-cooperative games. Different agents necessarily have different 
payoff structures. We proposed a new type of strategic games, heterogeneous 
games. We obtained and characterized the optimal transaction rules for each 
type of the transaction games . Through knowledge transaction, agents, can 
accumulate organizational knowledge as shared and common knowledge. We 
characterized the dynamic behavior of knowledge behavior in the long run. 
We obtained the completely different collective behaviors in the knowledge 
transaction with an organization of agents with the convex or concave value 
functions. 
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We propose a framework called “World Trade League,” which is expected 
to become a standard problem in multi-agent economics. The first purpose of 
World Trade League is to propose a network game in the context of economic 
and social systems. Such a game in World Trade League is executed by several 
players (countries) , where each of country consists of heterogeneous agents 
such as product makers, service suppliers, financial companies, government, 
and so on. A player (country) participating in the game is evaluated according 
to its contribution to development of the international economic system and 
environment protection, in addition to the development of its own country. 
The second purpose of World Trade League is to provide a standard problem 
for pure multi-agent simulations in economic context which many researchers 
can commonly analyze. The software to execute World Trade League is sup- 
plied by X-Economy System, where X-SS protocol is used for the common 
communication protocol among agents. 



23.1 Introduction 

Multi-agent approaches have been now the focus of researchers in modeling 
and analysis of economic systems in contrast with conventional economic 
approaches such as equilibrium theory and dynamical systems. Many resear- 
chers, however, model and analyze their own economic problems and seem to 
lack of sharing simulation results and computational techniques. 

In such a context, we propose a framework called “World Trade League” 
[23.1], which is expected to become a standard problem in multi-agent eco- 
nomics. We have mainly two purposes to propose a standard problem. The 
first one is to propose a network game which is executed by several players 
connected via networks. Each player represents a country, where each country 
consists of heterogeneous agents such as product makers, service suppliers, 
households, financial companies, central bank, government, and so on. 
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A player (country) participating in the game is evaluated according to 
its contribution to development of the international economic system and 
environment protection, in addition to the development of its own country. 
This regulation is designed to make the game in cooperative and ecology- 
minded atmosphere rather than selfish competition in international economic 
systems. 

The second purpose is to provide a standard problem for pure multi- 
agent simulations in economic context where many researchers can commonly 
analyze the problem and share the simulation results and techniques. For 
both of the two purposes, the software to execute World Trade League will 
be supplied as common library X-Economy System [23.2, 23.3], where X-SS 
protocol [23.4] is used for the common communication protocol among agents. 
We design the computational framework to include other applications, such 
as education, training, entertainment, economic experiment, and so on. 

In this paper, we describe the concept and background of World Trade 
League, with the basic design of network games, simulation frameworks, and 
computational libraries and communication protocols. 



23.2 Concept of World Trade League 

World Trade League is a game where each player prepares a country as a 
multi-agent system consisting of economic agents such as agriculture, ma- 
nufacturing, distribution, finance, government, and so on [23.1]. Although 
World Trade League can provide several types of games by changing the con- 
figurations and regulations of the system, we explain a full set of the game 
in the rest of this paper. An agent in a country should behave as follows. 

— It collects public information which is open to any agent. 

— It does decision-making of what to do now (or do nothing), and it selects 
one of the options which is available now. 

These two parts are essential for agent design and implementation. In addition 
to them, the following regulations exist. 

— Agents in a country should behave independently, i.e., a country has no 
centralized control system. In order to achieve this restriction, communi- 
cation among agents even in a country is open to public. Indirect controls 
are possible by such communication. 

— An agent in a county can make international trades with agents in other 
countries. This is done based on mutual agreement between two agents. 

It is a game where heterogeneous agents in several countries collect infor- 
mation, manufacture goods, make international trades, exchange currency, 
and compete with other countries in order to achieve its own economic deve- 
lopment and international collaboration with protecting natural environment 
(Fig. 23.1). 
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Fig. 23.1. Concept of World Trade League. 



23.3 Elements of World Trade League 

23.3.1 Behavior Options of Agents and Market Structure 

Agents which constitute a country have the following behavior options. 

A) Manufacturing and Service Agents 

To plan the amount of goods/services production and to execute it, i.e., 
to borrow funds from bank, issue stock and bonds, purchase of mate- 
rial, invest in plant and equipment, hire labor power, produce, and sell 
product. 

B) Distribution Agent 

To carry goods and human. 

C) Bank Agent 

To loan funds to other agents after determining how much funds it loan 
to a specific agent. To collect deposits from other agents. To make trades 
in financial markets. 

D) Central Bank Agent 

To decide interest rate and to loan funds to banks. To make trades in 
markets. 

E) Government Agent 

To collect tax and distribute subsidy. To issue national bonds. 
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F) Natural Resource Agent 

This is a special agent to represent natural resources and degree of en- 
vironmental pollution in a country. It is a passive agent and it does no 
active decision-making. 

In addition to agents, markets are prepared in the system. A market pro- 
vides a place where a specific goods, funds, capitals, labor powers are traded 
by agents in the whole system. One market is prepared in the whole system 
according to one specific object to be traded, i.e., food material, food, indu- 
stry material, industry goods, labor power, services, national bonds, bonds, 
stock, and so on. 

23.3.2 Game Settings and Complexity 

In order to control the complexity of games played in World Trade League, we 
can make game settings in several ways. In World Trade League, the following 
ways are prepared to modify the complexity of games. 

A) Degree of Economic Evolution 

Complexity of games can be modified according to the degree of historical 
economic evolution as follows. 

1. Medieval Stage: Currency exists, but no financial system like money 

loan exists. 

2. Modern Stage: Indirect financial system, i.e., bank loan and national 

bond exist. 

3. Contemporary Stage: Direct financial system, i.e., stock and company 

bond exist. 

B) The Number of Agents in a Specific Agent Type 

Complexity depends on the number of agents in a specific agent type. 
At the starting point, we use only one agent in a specific agent type in a 
country. 

C) The Number of Countries 

The number of countries (nations) participating in the game greatly in- 
fluences the complexity of the game. If the number is one, the game 
becomes a self-contained game, i.e., an economic simulation of a coun- 
try. The more nations participate in the game, the more complicated the 
game becomes. At the starting point, we assume that from two to five 
nations simultaneously participate in the game. 

D) Symmetric or Asymmetric Game 

1. Symmetric Game: All nations are given the same initial condition at 

the starting of the game. 

2. Asymmetric Game: The initial conditions of nations are different. By 

playing games several times with changing the role of nations, the 
overall condition for each nation can be made equal. 
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23.3.3 Evaluation Function of Players 

Unlike games proposed in previous multi-agent researches, several types of 
evaluation functions are prepared in World Trade League, i.e., we can run 
games or simulate economic systems in several types of boundary conditions. 
Players are evaluated by a single or a combination of the functions, e.g., the 
average of functions. Each of the function represents a certain aspect of the 
target economic system, e.g.: 

A) Economic Development of the Country 

It represents competitiveness of the world economic system. 

B) Imbalance of Economic Development among Countries 

It represents cooperativeness of the world economic system. 

C) Stableness of the World Economic System 

It represents stableness of the whole system, in order to avoid sudden 
changes in a country or in the whole system. 

D) Degree of Environment Protection 

It represents ecological coexistence of a country. 

E) The improvement of living standards 

The degree of living improvement of people in a nation. 

These evaluation functions are composed by 1) GNP 2) The amount of 
produced goods 3) Pollution degree of environment 4) Degree of distribution 
of produced goods, and so on. 



23.4 Implementation 

23.4.1 System Architecture 

In order to run games of World Trade League, we are now implementing a 
server, client class structure, and sample clients programs. The architecture 
of the system is shown in Fig. 23.2. Although only one country exists in 
this figure, several countries are connected to the server in the real games or 
simulations. 

The server consists of two modules, 1) Communication Control module, 
which controls all message transactions among agents, and 2) Database, 
which stores current status and all the history of each agent at micro-level 
and of the whole system at macro-level. 

A module which collects requests from agents and acts as a mediator, such 
as market, is called ‘medium’ in order to distinguish it from regular agents 
existing in a nation. 

All agents and mediums are prepared in class library, and users who wish 
to join the game can instantiate agent and medium instances from the library. 
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Fig. 23.2. System Architecture of World Trade League. 



23.4.2 Communication Protocol X-SS 

In World Trade League, a series of communication protocols called X-SS 
(extensible Social System) Protocol is prepared in order to reserve the ex- 
tensibility of agent communications and game regulations [23.4]. 

Generally speaking, we have to prepare n{n— l)/2 types of protocol when 
n types of agents exist. In addition to it, we have to prepare n types of new 
protocols when adding a new type of agent. This way of protocol definition 
clearly have problems in computational complexity, extensionability, clarity, 
and easiness in understanding. 

In X-SS, the protocol definition is not based on agent types, but on objects 
(goods, services, currency, or information) which are traded or exchanged in 
the game. For example, a trade of goods can be represented as an exchange 
of goods and currency, and collection and transmission of information as an 
exchange of information and currency. If you obtain free information, the 
price measured in currency should be set to zero. In addition to it, because 
one object to be traded or exchanged is almost always currency, we have to 
prepare just only m types of protocol where m is the number of objects to 
be exchanged in the system. Objects to be traded are as follows. 

~ Currency: Unique in the whole system, or prepared for each country. Local 
currency can be additionally defined. 

— Goods: Food material, food, industrial material, industrial goods. 




214 



K. Kurumatani and A. Ohuchi 



— Services: Transportation, amusement, general. 

— Labor power 

— Financial Goods: National bond, private bond, stock. 

— Information 

Using the protocol definition, an agent can be characterized by the goods 
which the agent can trade, and the class hierarchy of agents can be clearly 
defined. 

We prepare several ways to define and implement the protocol and com- 
munication module, because agents and server can be implemented in several 
kinds of programming language and they may use several kinds of communi- 
cation infrastructure: 

— XML representation over networks in TCP/IP, UDP 

— CORBA representation over networks in TCP/IP, UDP 

— JAVA class library which represents message object 

— C-|— I- class library which represents message object 

About the latter two types of message object, the communication between 
an agent and the server is carried out by 1) instantiating a message object 
instance from a message class, and by 2) calling proxy method implemented 
in the communication module of the server with setting the message object 
instance as an argument. The detail of protocol is, therefore, clearly defined 
as message class, and we can keep maintenancability and extensibility of the 
protocol. 



23.5 Requirements for Standard Problem in Multi-agent 
Economics 

To propose standard problems in economic and social system research, we 
think that they have to satisfy the following requirements. 

— Validity of the problem as a model of real systems: 

Unlike the simulation in natural science, it is impossible to model and 
simulate the whole details of the target system in economic and social 
science. We have to extract the essence of the structure and behaviors of 
the target system, and to verify whether the simulation result can explain 
the essence of the target system. In other words, the setting of a standard 
problem should fit a suitable abstraction level of the target system. 

— Applicability of several techniques to the problem: 

A standard problem should be attacked by several types of techniques in 
social science, computer science, and artificial intelligence, e.g., dynami- 
cal system theory, game theory, agent-based simulation, machine learning 
techniques, and so on. In order to satisfy the criterion, the problem should 
be clearly described in computational sense and it should not include un- 
natural constraints. 
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— Complexity: 

A problem should be complex enough, so that we cannot find the best 
strategy to easily solve the problem. Standard problems in artificial intel- 
ligence such as Chess, Shogi, and Igo have enough complexity in contrast 
with the simplicity and clearness of the game definition and settings. 

— Closed problem rather than open problem: 

A standard problem had better be a closed problem rather than open one. 
Because an open problem is influenced by information brought from outside 
of the target world, there is a possibility that the information gap among 
agents exists. In other words, the quantity and the quality of information 
for each player can vary, it is difficult to keep fairness among the players. 
Such shortcoming caused by openness becomes the essential difficulty when 
executing fair network games and strict simulations. 

The game settings of World Trade League is designed to satisfy the above 
requirements. 



23.6 Related Work 

Multi-agent approach to market analysis called artificial market research is 
one of the active research areas, and many kinds of analysis have been carried 
out. Many fruitful simulation results have been already obtained in this area. 

It seems difficult, however, to design network games in the context of 
market and trading. One of such network games is U-Mart, which is designed 
for a stock future market [23.5, 23.6]. This kind of approach has shortco- 
mings that it is necessary to give information to agents from the outside 
of the system, e.g., so-called fundamental information (interest rate, benefit 
and business results of companies, perspective of national and international 
economics, etc.) must be unnaturally given to agents. This openness of the 
game crucially spoils the fairness among game participants. 

Some research projects to model and analyze a whole economic system 
as multi-agent system have started. For instance. Virtual Economy Project 
[23.7, 23.8] provides a basic economic database for SNA based on Exchange 
Algebra. The approach lacks of the idea of agent design and of extending the 
framework to multi-nation environment. 

Another approach to a whole economic system is Boxed-Economy Project 
[23.9, 23.10, 23.11]. They design the templates of economic agents as object 
class in detail, in order to construct a class structure of economic agents from 
the most genetic form to a specific one. 

Our approach in World Trade League is to design both computational fra- 
mework and agent class structure simultaneously. In that sense, our approach 
contains both directions of the above two approaches, and it provides a fle- 
xible common framework for network game, simulation, education, training, 
entertainment, and economic experiment. 
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World Trade League as a network game has a common characteristics 
with RoboCup Soccer [23.12], because both games consist of multi-players 
and each player itself is a multi-agent system, although World Trade League 
is a heterogeneous multi-agent system. World Trade League and RoboCup 
Rescue [23.13] uses multiple evaluation functions in order to evaluate complex 
aspects of target social systems. 



23.7 Conclusion 

We have proposed a framework called World Trade League which provides a 
standard problem in multi-agent economics. In the network game of World 
Trade League, several countries compete in order to achieve economic deve- 
lopment of each own country with keeping cooperative relations with other 
countries and protecting environment. 

A question is frequently asked: “Why do you call the game ‘World Trade 
League’ instead of ‘World Trade Game’ ?” 

The answer is that we hope we find the path to achieve sustainable eco- 
nomic development in the game, with keeping the development of the whole 
world economic system, and with conserving the natural environment. The 
game should not become a field where a country which pursuits its own be- 
nefit obtain the best position. 

We are now designing the detail of the network game in World Trade 
League, implementing common libraries X-Economy System based on X- 
SS protocols, and verifying the game as standard problem in detail. The 
regulations and game settings for public will be announced in the coming 
papers and on the web sites. 
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In this paper we analyze an economic systems as agent based bottom up 
models. For the purpose we introduce a small national economy called a 
virtual economy and an exchange algebra for state space description. We 
construct dynamical agent based simulation model and analyze it. 



24.1 Introduction 

In this paper we construct a Simulation & Gaming model of a virtual eco- 
nomy. The virtual Economy consists of nine agents such as Agriculture, Mil- 
ling Industry, Bread Industry (Bakery), Steel Industry, Machinery Industry, 
Government, Household, Bank and Gentral Bank. For the purpose an alge- 
braic abstraction of bookkeeping system, which is called an exchange algebra, 
is introduced for describing micro economic exchange among agents. An eco- 
nomic state of each agent is also described by the algebra. Exchange algebra 
is an extension of accounting vector space [24.1, 24.2]. By using this alge- 
bras we describe systemic properties of economic exchange and properties of 
economic field. The economic field gives a formal model of SNA (System of 
National Account). 

The virtual economy model is illustrated with Fig.24.1. 

In the model economy agriculture grows wheat, milling industry makes 
wheat flour of wheat, bread industry (bakery) makes bread from flour, steel 
industry makes steel and machinery industry makes machinery from steel. In 
the model we assume that there are no materials for steel industry. House- 
hold purchases and consumes bread. Machines are purchased by industries as 
capital investments. The machines are used for production. Machines are also 
purchased by government or household. The machines that are purchased by 
government or household are considered as infrastructure and houses respec- 
tively. A machine depreciates according to a scenario. Population increases by 
a scenario. Household supplies workers to each industry and a government. 

T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 218-226, 2001. 
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Fig. 24.1. Virtual Economy 



Then household receives a wage. A government can issue national bonds. The 
central bank issues a bank note and fixes the official bank rate. Household 
and industries deposit money in a bank. A bank lends money. 



24.2 Agent Based Simulation Model for Virtual 
Economy 

In the virtual economy gaming players act as government, agriculture mil- 
ling industry, bakery, steel industry, machinery industry, household, bank and 
central bank depending on their roles. This virtual economy becomes a multi 
agent model of an economic system of a country. In the economy players or 
machine agents act as decision makers. The game needs some basic assumpti- 
ons. For example we have five products and one currency in this economy. We 
also assume proper units for the products and currency as follows. “MOU” 
stands for money unit such as dollar. “WHU” stands for wheat unit, “FLU” 
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stands for flour unit, “BRU” stands for bread unit, “STU” stands for steel 
unit and “MAU” stands for machine unit. They regard a machine as a house 
in the household. 

We try to construct an agent based simulation model for this economy. 
Fig. 24. 2 shows the total design for the agent based simulation of the virtual 
economy. 




Fig. 24.2. Basic Design of the Agent Based Simulation Model of Virtual Economy 



Fig. 24. 3 shows a prototype decision making model for a single human 
player. In this paper we introduce two types of dynamical models for this 
virtual economy gaming. The one is called the dictator’s view model. In this 
model a player has to make all decisions for transactions among agents of this 
economy in a term like a dictator. Table 24.1 shows decision making items 
for a player in a term. 

The other is called the bird’s eye view model across the terms. In this 
model some decision is are made automatically depending on hidden decision 
making rules. A player makes a decision across the time in this model. In 
the former model decisions are made step by step in terms. But in the latter 
model a player has a bird’s eye view across the terms. In the model a player 
can observe total periods of economic development and makes a decision 
across the terms for achieving his aim in the economy. 

Table 24.2 shows institutional parameters such as subsidy policy and na- 
tional bond policy. A decision is made such as true or false. Table 24.3 shows 
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Table 24.1. Decision Making for Dictator’s View Model 



— — 


Agri 


Flou 


Bake . 


Steel 


Mach . 


House 


Gov . 


CB 


Bank 


Income Tax Rate 


* 


* 


* 


* 


* 


* 


0.1 


* 


* 


Corporate Tax Rate 


* 


* 


* 


* 


* 


* 


0.2 


* 


* 


National Bond Rate 


* 


* 


* 


* 


* 


* 


0 . 01 


* 


* 


Official Bank Rate 


* 


* 


* 


* 


* 


* 


* 


0.01 


* 


Deposits in CB 


* 


* 


* 


* 


* 


* 


* 


0 


0 


Withdraw from CB 


* 


* 


* 


* 


* 


* 


* 


0 


0 


Loan from CB 




* 


* 


* 




* 


* 


1000 


1000 


Refund to CB 




* 


* 


* 




* 


* 


0 


0 


Receive Subsidy 


0 


0 


10 


10 


10 


0 


30 


* 


0 


Deposit Interest 


* 


* 


* 


* 


* 


* 


* 


* 


0.01 


Loan Interest 


* 


* 


* 


* 


* 


* 


* 


* 


0.03 


Buy National Bond(NB) 


0 


0 


0 


0 


0 


0 


0 


* 


* 


Redeem NB 


0 


0 


0 


0 


0 


0 


0 


* 


♦ 


Accept NB by CB 


* 


* 


* 


* 


* 


* 


0 


0 


* 


Redeem NB from CB 


* 


* 


* 


* 


* 


* 


0 


0 


* 


Loan from Bank 


0 


300 


100 


300 


300 


0 


* 


* 


1000 


Redeem to Bank 


0 


0 


0 


0 


0 


0 


* 


* 


0 


Deposit in Bank 


0 


0 


0 


0 


0 


0 


* 


* 


0 


Withdraw from Bank 


0 


0 


0 


0 


0 


0 


* 


* 


0 


Product Price per Unit 


0.2 


0.5 


1 


6.25 


11 


* 


* 


* 


* 


Capital Investment (numbers) 


2 


3 


2 


4 


8 


2 


0 


* 


* 


Sales of Products (Quantity) 


770 


580 


420 


14 


21 


♦ 


* 


* 


* 


Numbers of Employment 


70 


70 


60 


30 


65 


330 


35 


* 


♦ 


Total Wage 


90 


90 


80 


30 


90 


430 


50 


* 


* 



Table 24.2. Institutional Parameters 



Adoption of a Policy : Institutional Parameters 




1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Subsidy for Half the 
Capital Investment 


FALS 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


FALSE 


EALSE 


FALSE 


FALSE 


Subsidy for the Deficit of 
Makers 


FALSE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


Subsidy for Half the 
House Investment 


FALSE 


FALSE 


FALSE 


FALSE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


TRUE 


Issue National Bond under 
guarantee of Central Bank 


FALSE 


FALSE 


FALSE 


FALSE 


FALSE 


EALSE 


TRUE 


TRUE 


TRUE 


TRUE 
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Fig. 24.3. Prototype Decision Ma- 
king Model for a Single Human Player 



Table 24.3. Capital Investment 



Term 


Machine 

available 


Agri 


Flour 


Bakely 


Stee 

1 


Machine 


House 

hold 


Gov. 


Sum 


Stock of 
Machine 


1 


26 


2 


4 


1 


7 


11 


1 


0 


26 


0 


2 


35 


2 


5 


4 


10 


14 


0 


0 


35 


0 


3 


44 


2 


5 


6 


10 


19 


2 


0 


44 


0 


4 


55 


2 


10 


8 


10 


23 


1 


0 


54 


1 


5 


62 


3 


15 


10 


10 


22 


2 


0 


62 


0 


6 


75 


2 


15 


17 


12 


25 


3 


1 


75 


0 


7 


81 


4 


17 


15 


10 


28 


4 


3 


81 


0 


8 


81 


5 


21 


13 


16 


24 


2 


0 


81 


0 


9 


88 


5 


22 


13 


10 


33 


3 


2 


88 


0 


10 


97 


3 


30 


15 


13 


32 


3 


0 


96 


1 
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Table 24.4. Management and Political Decisions 



Management & Political Decisions of Agents 


Term 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Income Tax Rate 


0.1 


0.05 


0.05 


0.05 


0.02 


0.02 


0.05 


0.05 


0.05 


0.05 


Corporate Tax Rate 


0.2 


0.1 


0.2 


0.2 


0.1 


0.1 


0.1 


0.1 


0.1 


0.1 


National Bobd Rate 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


Official Bank Rate 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


Deposit Interest 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


Loan Interest 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


Price Wheat 


0.2 


0.2 


0.2 


0.2 


0.2 


0.2 


0.2 


0.2 


0.2 


0.2 


Price Flour 


0.5 


0.5 


0.5 


0.52 


0.52 


0.52 


0.52 


0.52 


0.52 


0.52 


Price Bread 


1 


1 


1 


1 


1 


1 


1.1 


1.1 


1.1 


1.1 


Price Steel 


6.25 


6.25 


6.25 


6.5 


6.5 


6.5 


6.5 


6.5 


6.5 


6.5 


Price Machine 


10 


10 


10 


10 


10 


10 


10 


10 


10 


10 


Wage(Agr.)/Person 


1.2 


1.2 


1.2 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


Wage(Flou.)/Person 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


1.3 


Wage(Bake.)/Person 


1.3 


1.3 


1.3 


1.4 


1.4 


1.4 


1.4 


1.4 


1.4 


1.4 


Wage(Steel)/Person 


1 


1.1 


1.2 


1.3 


1.5 


1.5 


1.5 


1.5 


1.5 


1.5 


Wage(Mach.)/Person 


1.3 


1.3 


1.4 


1.6 


1.6 


1.6 


1.6 


1.6 


1.6 


1.6 


Wage(Gov.)/Person 


1.4 


1.4 


1.4 


1.4 


1.5 


1.5 


1.5 


1.5 


1.5 


1.5 



capital investment of each agent in each term. Table 24.4 shows management 
and political parameters. 

A player can observe several types of economic developments across the 
terms while he change these parameters. A player can set up different goals 
depending on the social indexes on which he is focussing such as the numbers 
of residents per house, GDP per person and foods consumption per person. 



24.3 Result of Simulation 

We show the results of economic developments in ten terms. The following fi- 
gures show the results of simulation by bird’s eye view model across the terms 
under the parameters of the previous tables. Fig. 24.4 shows the numbers of 
residents per house. Fig. 24. 5 shows GDP per person and food consumption 
per person. Fig.24.6 shows price index. Fig. 24.7 shows cash in the government 
and issued national bonds which are accepted in the central bank. 
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Fig. 24.4. The Numbers of Residents per House 




Term 

GDP (Real ) /Person — ■ — Food Consumption (BRU) /Person 



Fig. 24.5. GDP (Real) per Person & Food Consumption (BRU) per Person 




Fig. 24.6. Price Index 
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Term 

Cash in Government • Accept of National Bond by Central Bank 

Fig. 24.7. Cash in Government and Issued National Bond Accepted in Central 
Bank 




"Product Stock of Steel (STU) ' 



"Product Stock of Machine (MAU) 



10 



Fig. 24.8. Product Stock of Steel and Machine 



24.4 Conclusion 

We investigated an agent based simulation model of a small national economy. 
The model is different from usual macro economic model. We assumed bottom 
up state description of an economic agent by exchange algebra. We can add 
multi agent decision making mechanism in the model from bottom up point of 
view as is shown in Fig. 24. 2. We want to express and design the institutional 
and structural varieties of real economy in agent based models. This is a first 
step for our research program. 
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25.1 Introduction 

The recent advancement of the agent-based modeling and simulation has 
been revolutionizing the social sciences and other research fields. The agent- 
based approach enables us to deal with the model that generates macroscopic 
phenomena by allowing numbers of agents to act at the micro level within the 
simulation. Therefore, in the social sciences, we can trace and understand the 
internal mechanisms in society. Since some interesting implications have been 
derived from the former researches with agent-based approach, expectations 
are rising in social sciences. 

In the last some years, several tools for agent-based simulations have been 
proposed: Swarm Simulation System[25.10], Ascape[25.11], RePast[25.12], 
M AML [25.7] and so on. Especially Swarm Simulation System has become 
one of the most famous and most growth toolkit in many research fields. 
Although these tools have promoted to share some kind of components such 
as Graphical User Interfaces (GUI) among researchers, they have been less 
successful in sharing and cumulating the parts of simulation models. It is 
because the provided basis are too high abstract for users to follow when 
they build the sharable components. To design the sharable and reusable 
models, the domain-specific design is required at the level of social model 
rather than the level of abstract general-purpose model. To put it another 
way, the social scientists really need not only the abstract basis, such as ma- 
thematical operators, but also the model components, such as production 
function or consumption function in economics. Indeed, economists usually 
specify their model with using the typical model components. There are, for 
example, some types of production functions: Gobb-Douglas type, GES type. 
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and Translog type in economics. They hardly ever make the model compo- 
nents from scratch each time^. 

We, then, would like to provide the model framework specializing in 
agent-based model of economic society, incorporating the idea of object- 
oriented framework that define the basic architecture of economic and social 
models [25. 8]^. We call our model framework “Boxed Economy Foundation 
Model”. 



25.2 Model Framework for Agent-Based Economic 
Simulations 

Boxed Economy Foundation Model provides the framework for modeling the 
economic society. The foundation model is an abstract model of a real society 
from the viewpoint of economy. We would like to suggest especially that the 
design with object-oriented framework is more significant than the design 
simply with components or objects in the field of the economic and social 
simulations. This is because the introduction of the frameworks makes it 
easier for the simulation builders to build, share and co-improve the economic 
simulations. 

Framework is the architecture that is specialized to a certain domain. 
Framework provides many kinds of plug-points (container) to connect the 
components that would be implemented by the simulation builders in each 
simulation(Fig. 25.1). Frameworks is important for reusing and co-improving 
due to define a “context” for the components developed in the future, alt- 
hough it is usually difficult to combine the components developed by inde- 
pendent groups, because they have inconsistent assumptions each other. 

To build a realistic model step by step, it is necessary to urge the re- 
searchers and some businessperson from other areas to participate in the 
development. Boxed Economy introduces the idea of framework to simulate 
the economic society and keep the architecture on one track. Therefore, the 
simulation builders can make the models in parallel as long as they keep 
the same framework, and they can concentrate on the object related to their 
major: consumer, corporation, and so on. 



25.3 Boxed Economy Foundation Model 

Boxed Economy Foundation Model has the definition of the fundamental 
relationship between each part of artificial economy model. Fig. 25.2 shows 

^ Most of the simulation models are built from scratch each time in agent-based 
research. Enormous developing time and costs are required in this style. 

^ C.Bruun[25.4] has similar motivation and also try to make model framework 
for agent-based economic simulations. There is, however, critical differences in 
regard to the agent design, as we will mention later. 
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Fig. 25.1. Object- 
Oriented Framework 
and Components 



the classes and their relationships in the Boxed Economy Foundation Model 
which is expressed in Unified Modeling Language (UML)[25.2]. The main 
part of the foundation model currently contains 14 classes and they are called 
“foundation model class”. The classification is as follows^: 

— EconomicActor, SocialGroup, Individual 

— Goods, Information, Possession 

— Behavior, BehaviorManagement, Memory, Needs 

— Relation, Path 

— Glock, Location 

An “agent” can be a representative of any autonomous subjects in the 
economy. It means that each individuals and social groups such as govern- 
ment or corporations are all dealt as “agent” in the model. The “agent” which 
is defined in the Boxed Economy, is formed by the following classes: [Eco- 
nomicActor] as its core, [Behavior], [BehaviorManagement] and [Memory]. 
[EconomicActor] reacts with these classes that surround it and becomes an 
agent in artificial economy. 

In the rest of this section, we would like to introduce the definition of 
some classes, their correspondence to the real society and the relationship 
with other classes in the model by catalog style. 



25.3.1 EconomicActor, SocialGroup, Individual 
[EconomicActor] 

Definition: An actor who carries economic activities in the artifi- 

cial society. 

Correspondence: Human or social group as consumer, corporation, bank, 

government, etc. 

® The design and definitions of Boxed Economy Foundation Model are a temporary 
statement and they might be changed in the future. 
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Fig. 25.2. Main Architecture of Boxed Economy Foundation Model 



Explanation: [EconomicActor] is the core element that executes 

the economic activities. [Behavior], [BehaviorManage- 
ment] and [Memory] are added in order to create an 
“agent”. [EconomicActor] stands for the [Individual] 
and the [SocialGroup] . 

Related Class: [EconomicActor] owns one [Memory], [BehaviorMa- 

nagement] and more than one [Behavior]. Also it 
owns many [Goods] and exchange them through [Path] 
which will be created based on the [Relation] they have. 

[SocialGroup] 



Definition: 

Correspondence: 

Explanation: 



Related Class: 



A group which is formed by the [EconomicActor]. 
Social group as corporation, regional community, etc. 
[SocialGroup] is one kind of [EconomicActor]. [Social- 
Group] consists of [EconomicActor] or other [Social- 
Group]. Note that it is possible to have [SocialGroup] 
inside another group. 

This class is extended from the [EconomicActor] class 
and inherits all the characteristics, it holds [Memory], 
[Behavior], [BehaviorManagement] , [Goods], [Relation] 
and [Path]. 
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[Individual] 



Definition: 

Correspondence: 

Explanation: 



Related Class: 



A single human being in the artificial society. 

Human being. 

[Individual] is one kind of [EconomicActor] . The diffe- 
rences between [Individual] and [SocialGroup] is that 
[Individual] may have the [Needs]. [Individual] is the 
minimum unit to form [SocialGroup] . 

This class is extended from the [EconomicActor] class 
and inherit its characteristics, then contains [Me- 
mory], [Behavior], [BehaviorManagement] , [Goods], 
[Relation], and [Path]. 



25.3.2 Goods, Information, Possession 



[Goods] 



Definition: 

Correspondence: 

Explanation: 



Related Class: 



Everything that is owned or exchanged by [Economi- 
cActor]. Also can be something that is invisible. 
Gommodities, service, money, etc. 

[Goods] has the following attributes, name, kind, vi- 
sibility, date of produce, basic endurance, portability, 
divisibility, amount, unit of measurement, etc. Gareer 
of information and also money as well are treated as a 
kind of [Goods]. 

[Goods] is named as [Possession] when it is owned 
by [EconomicActor] . [Information] is always exchanged 
with some kind of [Goods] as a carrier not by itself. 



[Information] 



Definition: 

Correspondence: 

Explanation: 



Related Class: 



Knowledge which is an expression of many facts. 
Knowledge stored in documents, the contents of com- 
munications and advertisement, etc. 

[Information] does not stand by itself, but is always 
a thing which is contained by [Goods]. For example 
when papers contain the [Information] , it will be a do- 
cument, and when voice becomes the carrier it will be a 
verbal communication. When information reaches the 
[EconomicActor], it will be decoded into [Memory]. 
[Information] is always exchanged with some kind of 
[Goods] as a carrier not by itself. 
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25.3.3 Behavior, BehaviorManagement, Memory, Needs 



[Behavior] 

Definition: 

Correspondence: 



Explanation: 



Related Class: 

[Memory] 

Definition: 

Correspondence: 

Explanation: 



Related Class: 

[Needs] 

Definition: 

Correspondence: 

Explanation: 



Related Class: 



An element to construct the decision and action of the 
economic actor. 

The corporate behaviors of strategic decision-making, 
production, sales, etc. And the individual behavior of 
purchase decision-making, information processing, etc. 
Each of decision-making and behavior is defined as the 
behavior. Each [Economic Actor] is able to execute the 
decision-making and behavior which is defined by [Be- 
havior] it has. 

It is held in [EconomicActor] . 



Knowledge that is stored in the economic actor. 
Things that somebody knows, etc. 

[Memory] would be referred to when the agent has to 
make a decision. By time to time, memory would be 
refreshed by its experience. 

It is stored in [EconomicActor]. 



A drive that motivates individual to an action. 

Desire of human. 

[Needs] is a thing that [Individual] holds as a mecha- 
nism of action, but a [SocialGroup] does not have this. 
The state of lack drives the [individual] to some kind 
of action and the desire would be fulfilled. 

It is held by [individual]. 



25.3.4 Relation, Path 



[Relation] 



Definition: 

Correspondence: 

Explanation: 



Related Class: 



A state that [EconomicActor] knows some other [Eco- 
nomicActor] . 

The relationship of family, friends, labor, neighbor- 
hood, etc. 

Having [Relation] is a state that the communication is 
enabled. By the [Information] which the agent gains, 
there would be a new [Relation] constructed. [Relation] 
would be normally expressed as a one-way but when 
both of them connects each other it will be two-way. 

It is held by an [EconomicActor]. 




[Path] 

Definition: 
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Correspondence: 

Explanation: 



Related Class: 



A path created with its relation to communicate with 
other economic actor. 

A path to exchange items or to communicate with 
others. 

Items or contents of verbal communication we would be 
exchanged through out this path. For example, retailer 
will open a path to the customer to give the item to 
him/her. 

[EconomicActor] will create a path by its [Relation] 
and the [Path] enables to pass the [Goods] to one ano- 
ther. 



25.4 Applying Boxed Economy Foundation Model 

25.4.1 Modeling Behavior Rather than Agent 

When you want to create a simulation based on Boxed Economy, you will be 
describing the details of the agents by using the class definition, which you 
have just read through. 

We would like to emphasize that it is important to characterize the 
agent as an object that has more than one behavior. This representation 
of the agent is epoch-making and has more advantage than the conventio- 
nal models which also handle the agent as a minimum indivisible unit in a 
simulation[25.3] [25.1] [25.6]. The advantage is that in this way it will be pos- 
sible to describe an agent to act more than one social role. For example, most 
of the individuals would act as “consumers” if they buy some items from the 
store, and would act as “labors” if they work to earn money. The point is 
that we do not have the subject called consumers or labors, but the subject 
we have in our society is only individual persons which act the role of consu- 
mer or labor in each scenes. In the Boxed Economy, we follow this idea and 
create the agent as an individual person that has the behavior of consumer, 
and we do not create a consumer agent. As a summary, to create the model 
of economic actor by using the Boxed Economy Foundation Model would be 
the modeling the behaviors that the economic actor has. 

25.4.2 Flexibility on the Boundary of Agent 

Boxed Economy provides the ability to the agents to be dynamic inside it. In 
other words, the agent based on the foundation model will be able to decide 
its own boundary. There are three ways of changing boundary. 

The first way of changing the boundary is to increase/decrease the number 
of actors inside it. Corporation agent, for example, will be able to change the 
number of workers by hiring and firing. 
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Fig. 25.3. Representation of Whole- 
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The second way of changing the boundary would be done by exchanging, 
increasing or decreasing the behavior that the agent has. Since the agent in 
the model is defined as an object that has the behaviors to make decisions 
or doing some kinds of actions, the functional boundary of the agent can be 
changed by adding/deleting its behaviors. For instance, if you want to let 
the seller agent to obtain the part of banking functions, it will be realized by 
adding the behavior of banking function to the seller agent . 

The third way of changing the boundary is to generate a new agent (can 
be individual or social group) or to disappear the existing agent. It may 
be birth or death for individuals, marriage or divorce will apply to families, 
foundation or bankruptcy for corporations. 

By providing the agents with the ability mentioned above, the agents in 
the simulation will be able to change and adjust themselves to the situation 
as time goes by. Since the analysis with artificial economy is often focused to 
observe the long-term movements in the whole economy, we need to imple- 
ment this behavior to the agent. 



25.4.3 Example: Sellers in Distribution Mechanism 

The mechanism of distribution include corporations which stand for producer, 
wholesaler and retailer in its structure. Both wholesaler and retailer mostly 
has the same behavior, but retailer only sells its items to the consumer and 
wholesaler is a reseller of products to anyone except the consumers. In the 
Boxed Economy we do not model the agents as wholesalers or retailers, in- 
stead we define the agents by dividing their decision-making and action by 
its behaviors (Fig. 25.3). In this way, it will be possible for many subjects to 
have the same behavior, and will provide expandability to the agents. 

In the fundamental model, we can create the social group within the 
social group: for instance, if you imagine departments in a corporation, both 
departments and corporation would be a [SocialGroup] (Fig. 25.4). And by 
using this idea we will be able to out-source some of the functions to others 
or we can also create a transportation business that only has the function of 
transportation. In the real world, there are movements of out-sourcing the 
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"Sales Department” 



Fig. 25.4. Representation 
of Wholesaler or Retailer 
(in the Case of Department 
Structure) 



function, or merge the whole structure of the corporation, and we will be 
ready to simulate such situations by modeling them with its behavior^. 



25.5 Conclusion 

In this paper, we proposed the concept and design of “Boxed Economy Fo- 
undation Model” , which is a sharable model framework for agent-based eco- 
nomic simulations. Here we should note that we have developed “Boxed Eco- 
nomy Simulation Platform” , which realizes the simulation environment for 
the simulation model based on Boxed Economy Foundation Model[25.9]. The 
platform is implemented by Java, which is portable and independent of the 
computer platform, and will be opened to public before soon (Fig. 25.5). 

Creating the foundation for the social simulation researches is an oversi- 
zed project for our members to complete. We would like to realize this by 
collaborating with many researchers in various fields. Please contact us on 
http://www.hoxed-economy.org/, if you are interested in our challenge. 
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^ The design that separates behaviors from the class is of great advantage not 
only to build flexible social models but also to build flexible software. To de- 
legate the role to other objects, which is called “composition”, is more flexible 
than inheritance, and is known as close way to the essence of object-oriented 
design [25. 5]. 
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Fig. 25.5. Boxed Economy Simula- 
tion Platform 
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Rough sets was proposed by Z. Pawlak in 1980 as the way how real-world 
concepts can be approximated by human measurements. For example, in 
a database, real-world concepts were approximated by the combination of 
attributes, as lower and upper approximation. The formal studies on this 
approximation can be viewed as the computation of information granularity, 
which are closely related with data mining, machine learning, multi-valued 
logic and fuzzy sets. 

The workshop on rough sets and granular computing started from May 20 
due to the number of paper submissions (30). The workshop consisted of three 
invited talks by Z. Pawlak, A. Skowron and S.K.Pal and 30 presentations of 
regular papers (3: inductive logic programming (ILP), 3: decision making, 
5: rule induction, 3: fuzzy logic, 3: granular computing, 5: fundamentals of 
rough sets, 6: applications, 2: conflict analysis). The number of attendees in 
this workshop was 42 in total(22: Japan, 5: Poland, 3: India, 2: US, 1: Korea, 
6: PhD students of Shimane University and Shimane Medical University). 

In the invited talks, Pawlak discussed the relations between rough sets 
and Bayesian inference and the Lukasiewicz multi-valued logic as a key no- 
tion of the bridge between rough sets and Bayesian reasoning. In the second 
talk, Skowron reviewed the studies on rough sets which plays important ro- 
les in the estimation of information granularity and discussed the potentials 
of granular computing in multi-agent systems. In the final talk. Pal discus- 
sed the importance of rough sets, fuzzy sets and genetic algorithms in data 
mining. 

In the regular sessions, not only the applications of rough sets but also 
several fundamental studies on the extensions of rough sets were presented. 
Also, theoretical studies on the combinations of rough sets and other methods, 
such as inductive logic programming and fuzzy reasoning were shown. These 
invited talks and regular papers showed that rough sets are widely used as 
an important tool for data mining and data analysis and that rough sets 
should be recognised as a fundamental tool for the theoretical studies on 
approximate reasoning. 



T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, p. 239, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




27. Bayes’ Theorem Revised — The Rough Set 
View 

Zdzislaw Pawlak 

Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, 
ul. Baltycka 5, 44 100 Gliwice, Poland 

Rough set theory offers new insight into Bayes’ theorem. The look on Bayes’ 
theorem offered by rough set theory is completely different from that used 
in the Bayesian data analysis philosophy. It does not refer either to prior 
or posterior probabilities, inherently associated with Bayesian reasoning, but 
it reveals some probabilistic structure of the data being analyzed. It states 
that any data set (decision table) satisfies total probability theorem and 
Bayes’ theorem. This property can be used directly to draw conclusions from 
data without referring to prior knowledge and its revision if new evidence 
is available. Thus in the presented approach the only source of knowledge is 
the data and there is no need to assume that there is any prior knowledge 
besides the data. We simply look what the data are telling us. Consequently 
we do not refer to any prior knowledge which is updated after receiving some 
data. 



27.1 Introduction 

This paper is an abbreviation of [27.8] 

Bayes’ theorem is the essence of statistical inference. 

’’The result of the Bayesian data analysis process is the posterior distri- 
bution that represents a revision of the prior distribution on the light of the 
evidence provided by the data” [27.5]. 

’’Opinion as to the values of Bayes’ theorem as a basic for statistical 
inference has swung between acceptance and rejection since its publication 
on 1763” [27.4]. 

Rough set theory offers new insight into Bayes’ theorem. The look on 
Bayes’ theorem offered by rough set theory is completely different to that 
used in the Bayesian data analysis philosophy. It does not refer either to prior 
or posterior probabilities, inherently associated with Bayesian reasoning, but 
it reveals some probabilistic structure of the data being analyzed. It states 
that any data set (decision table) satisfies total probability theorem and 
Bayes’ theorem. This property can be used directly to draw conclusions from 
data without referring to prior knowledge and its revision if new evidence 
is available. Thus in the presented approach the only source of knowledge is 
the data and there is no need to assume that there is any prior knowledge 
besides the data. We simply look what the data are telling us. Consequently 
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we do not refer to any prior knowledge which is updated after receiving some 
data. 

Moreover, the rough set approach to Bayes’ theorem shows close relations- 
hip between logic of implications and probability, which was first observed by 
Lukasiewicz [27.6] and also independly studied by Adams [27.1] and others. 
Bayes’ theorem in this context can be used to ’’invert” implications, i.e. to 
give reasons for decisions. This is a very important feature of utmost impor- 
tance to data mining and decision analysis, for it extends the class of problem 
which can be considered in these domains. 

Besides, we propose a new form of Bayes’ theorem where basic role plays 
strength of decision rules (implications) derived from the data. The strength 
of decision rules is computed from the data or it can be also an subjective 
assessment. This formulation gives new look on Bayesian method of inference 
and also essentially simplifies computations. 



27.2 Bayes’ Theorem 

In this section we recall basic ideas of Bayesian inference philosophy, after 
recent books on Bayes’ theory citeber:smi,box:tia,bert:han. 

In his paper [27.2] Bayes considered the following problem: ” Given the 
number of times in which an unknown event has happened and failed: re- 
quired the chance that the probability of its happening in a single trial lies 
somewhere between any two degrees of probability that can be named.” 

’’The technical results at the heart of the essay is what we now know 
as Bayes’ theorem. However, from a purely formal perspective there is no 
obvious reason why this essentially trivial probability result should continue 
to excite interest” [27.3]. 

”In its simplest form, if H denotes an hypothesis and D denotes data, the 
theorem says that 

P (H\D) = P (D\H) X P (H) /P (D) . 

With P{H) regarded as a probabilistic statement of belief about H before 
obtaining data D, the left-hand side P{H\D) becomes an probabilistic sta- 
tement of belief about H after obtaining D. Having specified P{D\H) and 
P{D), the mechanism of the theorem provides a solution to the problem of 
how to learn from data. 

In this expression, P{H), which tells us what is known about H without 
knowing of the data, is called the prior distribution of H, or the distribution 
of H a priori. Correspondingly, P{P[\D), which tells us what is known about 
PI given knowledge of the data, is called the posterior distribution of P[ given 
D, or the distribution of H a, posteriori’ [27.3]. 

” A prior distribution, which is supposed to represent what is known about 
unknown parameters before the data is available, plays an important role in 
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Baysian analysis. Such a distribution can be used to represent prior knowledge 
or relative ignorance” [27.4]. 

Let us illustrate the above by a simple example taken from [27.5]. 

Example 27.2.1. ’’Consider a physician’s diagnostic test for presence or ab- 
sence of some rare disease D, that only occurs in 0.1% of the population, 
i.e., P{D) = .001. It follows that P{D) = .999, where D indicates that a 
person does not have the disease. The probability of an event before the eva- 
luation of evidence through Bayes’ rule is often called the prior probability. 
The prior probability that someone picked at random from the population 
has the disease is therefore P{D) = .001. 

Furthermore we denote a positive test result by T+, and a negative test 
result by T~ . The performance of the test is summarized in Table 1 . 



Table 27.1. Performance of diagnostic test 





1 

+ 


D 

D 


0.95 0.05 

0.02 0.98 



What is the probability that a patient has the disease, if the test result 
is positive? First, notice that D,D is a partition of the outcome space. We 
apply Bayes’ rule to obtain 



P{D\T+) 



P{T+\D)P{D) 

P (T+\D) P{D) + P (T+\D) P (p) 



.95 • .001 

.95 • .001 -k .02 • .999 



.045. 



Only 4.5% of the people with a positive test result actually have the disease. 
On the other hand, the posterior probability (i.e. the probability after eva- 
luation of evidence) is 45 times as high as the prior probability” . □ 



27.3 Information Systems and Approximation of Sets 

In this section we define basic concepts of rough set theory: information 
system and approximation of sets. Rudiments of rough set theory can be 
found in [27.7, 27.10]. 

An information system is a data table, whose columns are labeled by 
attributes, rows are labeled by objects of interest and entries of the table are 
attribute values. 
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Formally, by an information system we will understand a pair S = (U, A), 
where U and A, are finite, nonempty sets called the universe, and the set of 
attributes, respectively. With every attribute a G A we associate a set Va, of 
its values, called the domain of a. Any subset B of A determines a binary 
relation I{B) on U, which will be called an indiscer nihility relation, and 
defined as follows: {x,y) G I{B) if and only if a{x) = a{y) for every a G A, 
where a(x) denotes the value of attribute a for element x. Obviously 1(B) is 
an equivalence relation. The family of all equivalence classes of 1(B), i.e., a 
partition determined by B, will be denoted by U/I(B), or simply by U/B-, 
an equivalence class of 1(B), i.e., block of the partition U/B, containing x 
will be denoted by B(x). 

If (x,y) belongs to 1(B) we will say that x and y are B-indiscernihle 
(indiscernible with respect to B). Equivalence classes of the relation 1(B) 
(or blocks of the partition U/B) are referred to as B-elementary sets or B- 
granules. 

If we distinguish in an information system two disjoint classes of attri- 
butes, called condition and decision attributes, respectively, then the system 
will be called a decision table and will be denoted by S' = (U, C, D), where C 
and D are disjoint sets of condition and decision attributes, respectively. 

Thus the decision table determines decisions which must be taken, when 
some conditions are satisfied. In other words each row of the decision table 
specifies a decision rule which determines decisions in terms of conditions. 

Observe, that elements of the universe are in the case of decision tables 
simply labels of decision rules. 

Suppose we are given an information system S = (U,A), X C U, and 
B C A. Our task is to describe the set X in terms of attribute values from 
B. To this end we define two operations assigning to every X C U two sets 
S*(A) and B*(X) called the B-lower and the B-upper approximation of X, 
respectively, and defined as follows: 

B,(X)= |J{S(x) :B(x)CX}, 
xeu 

B* (X) = [j{B(x)-.B(x)nX^ 0}. 
xeu 

Hence, the H-lower approximation of a set is the union of all B-granules that 
are included in the set, whereas the B-upper approximation of a set is the 
union of all H-granules that have a nonempty intersection with the set. The 
set 

BNb (X) = B* (X) - B, (X) 

will be referred to as the B-boundary region of X. 

If the boundary region of X is the empty set, i.e., BNb(X) = 0, then X 
is crisp (exact) with respect to B; in the opposite case, i.e., if BNb(X) yf 0, 
X is referred to as rough (inexact) with respect to B. 
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27.4 Rough Membership 



Rough sets can be also defined employing instead of approximations rough 
membership function [27.9], which is defined as follows: 

[ 0 , 1 ] 



and 

Aif (x) 



\B{x)nx\ 

\B{x)\ 



where X CU and B C A. 

The function measures the degree that x belongs to X in view of infor- 
mation about X expressed by the set of attributes B. 

The rough membership function, can be used to define approximations 
and the boundary region of a set, as shown below: 



R* {X) = {xeU: /rf (x) = 1}, 



B* {X) = {xeU: /xf (x) > 0}, 

BNb (X) = {x e 17 : 0 < /xf (x) < 1}. 



27.5 Information Systems and Decision Rules 



Every decision table describes decisions (actions, results etc.) determined, 
when some conditions are satisfied. In other words each row of the decision 
table specifies a decision rule which determines decisions in terms of conditi- 
ons. 

In what follows we will describe decision rules more exactly. 

Let S = {U, C, D) be a decision table. Every x G U determines a 
sequence ci(x),... ,c„(x), xii(x),... ,dm{x) where {ci,... ,c„} = C and 
{di, . . . , dra\ — D. 

The sequence will be called a decision rule (induced by x) in S and denoted 
by ci(x), . . . , c„(x) ^ di(x), . . . , dm{x) or in short C D. 

Decision rules are often presented as logical implications in the form 
”if...then...” . 

A set of decision rules corresponding to a decision table will be called a 
decision algorithm. 

The number suppx{C, D) = jC'(x) n D{x)\ will be called a support of the 
decision rule C -^x D and the number 



(Jx (C, D) 



suppx (C, D) 

\U\ 



will be referred to as the strength of the decision rule C -^x D, where jXj 
denotes the cardinality of X. With every decision rule C -^x D we associate 
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the certainty factor of the decision rule, denoted cerx{C,D) and defined as 
follows: 

/Cl (®) supp^ (C, D) 

= lew I = lew I = 

^ uAC,D) 

7t(C'(x)) ’ 

where tt {C (cc)) = . 

The certainty factor may be interpreted as a conditional probability that 
y belongs to D (cc) given y belongs to C (x), symbolically tTx {D\C) . 

If eexx (C, D) = 1, then C ^x D will be called a certain decision rule in 
5; if 0 < eexx {C, D) < 1 the decision rule will be referred to as an uncertain 
decision rule in S. 

Besides, we will also use a coverage factor of the decision rule, denoted 
covx (C, D) defined as 

/Cl \C{x)nD{x)\ suppx{C,D) 

= IDWI = WWI = 

_ ax (C, D) 
tt{D (x)) ’ 

where tt {D {x)) = 

Similarly 

COVx {C,D) = TTx {C\D) . 

If C ->-x D is a, decision rule then D -s-x C will be called an inverse 
decision rule. The inverse decision rules can be used to give explanations 
{reasons) for decisions. 

Let us observe that 

eexx (C, D) = (x) and coVx (C, D) = (x) . 

That means that the certainty factor expresses the degree of membership of 
X to the decision class D (x), given C, whereas the coverage factor expresses 
the degree of membership of x to condition class C (x), given D. 



27.6 Probabilistic Properties of Decision Tables 

Decision tables have important probabilistic properties which are discussed 
next. 

Let C ^x D he a decision rule in S and let T = C (x) and let Z\ = D (x) . 
Then the following properties are valid: 

eery (C, D) = 1 

yer 



(27.1) 
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covy (C, D) = 1 (27.2) 

y&A 



7T {D (x)) = Y cer-y (C, D) ■ tt (C (y)) = 
yer 

= Y^v {C.D) 

yer 



TT (C (s)) = Y cowj/ (C, D)-tt{D {y)) = 

yeA 

= Y'^y 

yeA 



ceXx (C,D) 



coVx {C, D) ■ tt{D (x)) 
Y, coVy (C, D)-tt{D (y)) 

yeA 

<jAC,d) 

tt{C (x)) 



(27.3) 



(27.4) 



(27.5) 



COVa; {C,D) 



cer^ (C, D)-tt{C (x)) 

X; eery {C,D) ■ tt {C {y)) 
yer 

ax {C, D) 
tt{D (x)) 



(27.6) 



That is, any decision table, satisfies (1),...,(6). Observe that (3) and (4) 
refer to the well known total prohahility theorem, whereas (5) and (6) refer to 
Bayes’ theorem. 

Thus in order to compute the certainty and coverage factors of decision 
rules according to formulas (5) and (6) it is enough to know the strength 
(support) of all decision rules only. The strength of decision rules can be 
computed from data or can be a subjective assessment. 

Let us observe that the above properties are valid also for syntactic deci- 
sion rules, i.e., any decision algorythm satisfies (1),...,(6). 

Thus, in what follows, we will use the concept of the decision table and 
the decision algorithm equivalently. 



27.7 Decision Tables and Flow Graphs 

With every decision table we associate a flow graph, i.e., a directed acyclic 
graph defined as follows: to every decision rule C D we assign a directed 
branch x connecting the input node C (x) and the output node D (x) . Strength 
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of the decision rule represents a throughfiow of the corresponding branch. The 
throughflow of the graph is governed by formulas (1),...,(6). 

Formulas (1) and (2) say that an outflow of an input node or an output 
node is equal to their inflows. Formula (3) states that the outflow of the 
output node amounts to the sum of its inflows, whereas formula (4) says that 
the sum of outflows of the input node equals to its inflow. Finally, formulas 
(5) and (6) reveal how throughflow in the flow graph is distributed between 
its inputs and outputs. 



27.8 Comparison of Bayesian and Rough Set Approach 

Now we will illustrate the ideas considered in the previous sections by means 
of the example considered in section 2. These examples intend to show clearly 
the difference between ’’classical” Bayesian approach and that proposed by 
the rough set philosophy. 

Observe that we are not using data to verify prior knowledge, inherently 
associated with Bayesian data analysis, but the rough set approach shows 
that any decision table saflsties Bayes’ theorem and total probability theorem. 
These properties form the basis of drawing conclusions from data, without 
referring either to prior or posterior knowledge. 

Example 27.8.1. This example, which is a modification of example 1 given 
in section 2, will clearly show the different role of Bayes’ theorem in classical 
statistical inference and that in rough set based data analysis. 

Let us consider the data table shown in Table 2. 



Table 27.2. Data table 





T+ T~ 


D 

D 


95 5 

1998 97902 



In Table 2, instead of probabilities, like those given in Table 1, numbers 
of patients belonging to the corresponding classes are given. Thus we start 
from the original data (not probabilities) represanting outcome of the test. 

Now from Table 2 we create a decision table and compute strength of 
decision rules. The results are shown in Table 3. 

In Table 3 D is the condition attribute, wheras T is the decision attribute. 
The decision table is meant to represent a ’’cause-effect” relation between 
the disease and result of the test. That is, we expect that the disease causes 
positive test result and lack of the disease results in negative test result. 





248 



Z. Pawlak 



Table 27.3. Decision table 



fact 


D 


T 


support 


strength 


1 


+ 


+ 


95 


0.00095 


2 


- 


+ 


1998 


0.01998 


3 


+ 


- 


5 


0.00005 


4 


- 


- 


97902 


0.97902 



The decision algorithm is given below: 

1’) if {disease, yes) then {test, positive) 

2’) if {disease, no) then {test, positive) 

3’) if {disease, yes) then {test, negative) 

4’) if {disease, no) then {test, negative) 

The certainty and coverage factors of the decision rules for the above decision 
algorithm are given is Table 4. 



Table 27.4. Certainty and coverage 



•rule 


strength 


certainty 


coverage 


1 


0.00095 


0.95 


0.04500 


2 


0.01998 


0.02 


0.95500 


3 


0.00005 


0.05 


0.00005 


4 


0.97902 


0.98 


0.99995 



The decision algorithm and the certainty factors lead to the following 
conclusions: 

- 95% persons suffering from the disease have positive test results 

- 2% healthy persons have positive test results 

- 5% persons suffering from the disease have negative test result 

- 98% healthy persons have negative test result 

That is to say that if a person has the disease most probably the test result 
will be positive and if a person is healthy the test result will be most probably 
negative. In other words, in view of the data there is a causal relationship 
between the disease and the test result. 

The inverse decision algorithm is the following: 

1) if {test, positive) then {disease, yes) 

2) if {test, positive) then {disease, no) 
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3) if {test, negative) then {disease, yes) 

4) if {test, negative) then {disease, no) 

From the coverage factors we can conclude the following: 

4.5% persons with positive test result are suffering from the disease 

- 95.5% persons with positive test result are not suffering from the disease 

0.005% persons with negative test results are suffering from the disease 

- 99.995% persons with negative test results are not suffering from the 
disease 

That means that if the test result is positive it does not necessarily indicate 
the disease but negative test results most probably (almost for certain) does 
indicate lack of the disease. 

It is easily seen from Table 4 the negative test result almost exactly iden- 
tifies healthy patients. 

For the remaining rules the accuracy is much smaller and consequently 
test results are not indicating the presence or absence of the disease. □ 

It is clearly seen from examples 1 and 2 the difference between Bayesian 
data analysis and the rough set approach. In the Bayesian inference the data 
is used to update prior knowledge (probability) into a posterior probability, 
whereas rough sets are used to understand what the data are telling us. 



27.9 Conclusion 

From examples 1 and 2 it is easily seen the difference between employing 
Bayes’ theorem in statistical reasoning and the role of Bayes’ theorem in 
rough set based data analysis. 

Bayesian inference consists in updating prior probabilities by means of 
data to posterior probabilities. 

In the rough set approach Bayes’ theorem reveals data patterns, which 
are used next to draw conclusions from data, in form of decision rules. 

In other words, classical Bayesian inference is based rather on subjective 
prior probability, whereas the rough set view on Bayes’ theorem refers to 
objective probability inherently associated with decision tables. 

Acknowledgments. The author wishes to express his gratitude to Professor 
Andrzej Skowron for many critical remarks. 
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28. Toward Intelligent Systems: Calculi of 
Information Granules 

Andrzej Skowron 
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We present an approach based on calculi of information granules as a basis for 
approximate reasoning in intelligent systems. Approximate reasoning schemes 
are defined by means of information granule construction schemes satisfying 
some robustness constraints. In distributed environments such schemes are 
extended to rough neural networks. Problems of learning in rough neural 
networks from experimental data and background knowledge are discussed. 
The approach is based on rough mereology. 



28.1 Introduction 

Computing with Words (CWW) (see, e.g., [28.38], [28.39], [28.40]) is one 
among a collection of recently emerging computing paradigms. The goal of 
this new research direction is to build foundations for future intelligent com- 
puters and information systems performing computations on words from na- 
tural language representing concepts rather than on numbers. 

Information granulation belongs to intensively studied topics in soft com- 
puting (see, e.g., [28.38], [28.39], [28.40]). One of the recently emerging appro- 
aches to deal with information granulation is based on information granule 
calculi (see, e.g., [28.24], [28.33]). The development of such calculi is impor- 
tant for making progress in many areas like object identification by autono- 
mous systems (see, e.g., [28.3], [28.36]), web mining (see, e.g., [28.8]), spatial 
reasoning (see, e.g., [28.4]) or sensor fusion (see, e.g., [28.2], [28.16], [28.19]). 

One way to achieve CWW is through Granular Computing (GC). The 
main concepts of GC are related to information granulation and in particular 
to information granules [28.24]. 

Any approach to information granulation should make it possible to define 
complex information granules (e.g., in spatial and temporal reasoning, one 
should be able to determine if the situation on the road (see Fig. 28.1) is 
safe on the basis of sensor measurements or to classify situations in complex 
games, like soccer [28.35]). These complex information granules consitute a 
form of information fusion. Any calculus of complex information granules 
should permit to (i) deal with vagueness of information granules, (ii) develop 
strategies of inducing multi-layered schemes of complex granule construction, 
(iii) derive robust (stable) information granule construction schemes with 
respect to deviations of granules from which they are constructed, and (iv) 
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develop adaptive strategies for reconstruction of induced schemes of complex 
information granule synthesis. 




Fig. 28.1. Classification of situati- 
ons 



To deal with vagueness, one can adopt fuzzy set theory [28.37] or rough 
set theory [28.15] either separately or in combination [28.13]. The second 
requirement is related to the problem of understanding of reasoning from 
measurements to perception (see, e.g., [28.40]) and to concept approximation 
learning in layered learning [28.35] as well as to fusion of information from dif- 
ferent sources (see, e.g., [28.38], [28.39], [28.40]). The importance of searching 
for Approximate Reasoning Schemes (Ai?-schemes, for short) as schemes of 
new information granule construction, is stressed in rough mereology (see, 
e.g., [28.20], [28.21], [28.21], [28.22], [28.26], [28.27]). In general, this leads to 
hierarchical schemes of new information granule construction. This process 
is closely related to ideas of co-operation, negotiations and conflict resolution 
in multi-agent systems [28.7]. Among important topics studied in relation to 
Ai?-schemes are methods for specifying operations on information granules; 
in particular, for their construction from data and background knowledge, 
and methods for inducing these hierarchical schemes of information granule 
construction. One of the possible approaches is to learn such schemes using 
evolutionary strategies [28.10]. Robustness of the scheme means that any 
scheme produces rather a higher order information granule that is a clump 
(e.g., a set) of close information granules rather than a single information 
granule. Such a clump is constructed by means of the scheme from the Car- 
tesian product of input clumps (e.g., clusters) satisfying some constraints. 
The input clumps are defined by deviations (up to acceptable degrees) of 
input information granules. 

It is worthwhile to mention that modeling complex phenomena requires to 
use complex information granules representing local models (perceived by lo- 
cal agents) which next should be fused. This process involves the negotiations 
between agents [28.7] to resolve contradictions and conflicts in local mode- 
ling. This kind of modeling will become more and more important in solving 
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complex real-life problems which we are unable to model using traditional 
analytical approaches. If the latter approaches can be applied to modeling of 
such problems they lead to exact models . However, the necessary assumpti- 
ons used to build them in case of complex real-life problems are often causing 
the resulting solutions to be too far from reality to be accepted as solutions 
of such problems. 

Let us also observe, using multi-agent terminology, that local agents per- 
form operations on information granules from granule sets that are under- 
standable by them. Hence, granules submitted as arguments by other agents 
should be approximated by means of properly tuned approximation spaces 
creating interfaces between agents. The process of tuning of the approxima- 
tion space [28.32], [28.27] parameters in Hi?-schemes corresponds to the tu- 
ning of weights in neural networks. The methods for inducing of Hi?-schemes 
transforming information granules into information granules studied using 
rough set (see, e.g., [28.15], [28.9]) and rough mereological methods in hy- 
bridization with other soft computing approaches create a core for Rough 
Neurocomputing (RNC) (see, e.g., [28.14], [28.27]). In RNC, computations 
are performed on information granules. 

Another important problem concerns relationships between information 
granules and words (linguistic terms) in a natural language and also a possi- 
bility to use induced Ai?-schemes as schemes matching up to a satisfactory 
degree reasoning schemes in natural language. Further research in this direc- 
tion will create strong links between RNC and CWW. The results of such 
research will be of great importance for many applications (e.g., web mining 
problems. Fig. 28.2). 




Fig. 28.2. Web mining 



RNC is attempting to define information granules using rough sets [28.15], 
[28.9] and rough mereology (see, e.g., [28.21], [28.21], [28.22], [28.26], [28.27]) 
introduced to deal with vague concepts in hybridization with other soft com- 
puting methods like neural networks [28.29], fuzzy sets [28.13], [28.37], [28.39] 
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and evolutionary programming [28.14], [28.10]. The methods based on the 
above mentioned approaches can be used for constructing of more complex 
information granules by means of schemes analogous to neural networks. 

We outline a rough neurocomputing model as a basis for granular com- 
puting. 



28.2 Ai?-Schemes 

Ai?-schemes are the basic constructs used in RNC. We assume each agent 
ag from a given collection Ag of agents [28.7] is equipped with a system of 
information granules S{ag) specifying information granules the agent ag is 
perceiving and the inclusion (or closeness) relations to a degree used by ag to 
measure the degree of inclusion (or closenees) between information granules. 
A formal definition of information granule system the reader can find, e.g., 
in [28.31]. Using such system S{ag) the agent ag creates a representation for 
all components of S{ag). The details of such representation the reader can 
find, e.g., in [28.22], [28.24]. From such representations agents are able to ex- 
tract local schemes of approximate reasoning called productions. Algorithmic 
methods for extracting such productions from data are discussed in [28.21], 
[28.30], [28.34], [28.17], [28.18]. The left hand side of each production is (in 
the simplest case) of the form (sti(a(/), (e^^Vur^^); •> (s^fc(a5); (ei^\ •> Cr*^) 
and the right hand side is of the form {st{ag),{ei,-,er) for some positive 
integers k, r. 

Such production represents an information about an operation o which 
can be performed by the agent ag. In the production k denotes the arity 
of operation. The operation o represented by the production is transforming 
standard (prototype) input information granules sti{ag), • • • , stk{ag) into the 
standard (prototype) information granule st{ag). Moreover, if input informa- 
tion granules (/i, • • • ,gk are close to sti{ag), • • • , stk{ag) to degrees •, 
then the result of the operation o on information granules g\, - ■ • , gk is close 
to the standard st{ag) to a degree at least ej where 1 < j < fc. Standard 
(prototype) granules can be interpreted in different ways. In particular they 
can correspond to concept names in natural language. 

The described above productions are basic components of reasoning sy- 
stem over an agent set Ag. An important property of such productions is 
that they are expected to be discovered from available experimental data 
and background knowledge. Let us also observe that the degree structure is 
not necessarily restricted to reals from the interval [0, 1]. The inclusion de- 
grees can have a structure of complex information granules used to represent 
the degree of inclusion. It is worthwhile to mention that the productions can 
also be interpreted as a constructive description of some operations on fuzzy 
sets. The methods for such constructive description are based on rough sets 
and Boolean reasoning (see, e.g., [28.9], [28.15]). 
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Ai?-schemes can be treated as derivations obtained by using productions 
from different agents. The relevant derivations defining yli?-schemes are satis- 
fying so called robustness (or stability) condition. It means that at any node 
of derivation the inclusion (or closeness) degree of constructed granule to the 
prototype (standard) granule is higher than required by the production to 
which the result should be sent. This makes it possible to obtain a sufficient 
robustness condition for the whole derivations. For details the reader is re- 
ferred to, e.g., [28.22], [28.24], [28.25], [28.26]. In case where standards are 
interpreted as concept names in natural language and there is given a reaso- 
ning scheme in natural language over the standard concepts the corresponding 
d.i?-scheme represents a cluster of reasoning (constructions) approximately 
following (by mens of other information granule systems) the reasoning in 
natural language. 



28.3 Rough Neural Networks 

We extend Ai?-schemes for synthesis of complex objects (or granules) deve- 
loped in [28.24] and [28.22] by adding one important component. As a result 
we obtain granule construction schemes that can be treated as a generaliza- 
tion of neural network models. The main idea is that granules sent by one 
agent to another are not, in general, exactly understandable by the recei- 
ving agent. This is because these agents are using different languages and 
usually does not exist any translation (from the sender language to the recei- 
ver language) preserving exactly semantical meaning of formulas. Hence, it is 
necessary to construct interfaces that will make it possible to understand re- 
ceived granules approximately. These interfaces can be, in the simplest case, 
constructed on the basis of information exchanged by agents and stored in 
the form of decision data tables. From such tables the approximations of con- 
cepts can be constructed using rough set approach [28.33]. In general, it is 
a complex process because a high quality approximation of concepts can be 
often obtained only in dialog (involving nagotiations, conflict resolutions and 
cooperation) among agents. In this process the approximation can be con- 
structed gradually when dialog is progressing. In our model we assume that 
for any n-ary operation o{ag) of an agent ag there are approximation spaces 
ASi{o{ag),in), ASn{o{ag),in) which will filter (approximate) the granu- 
les received by the agent for performing the operation o{ag). In turn, the 
granule sent by the agent after performing the operation is filtered (appro- 
ximated) by the approximation space AS{o{ag),out). These approximation 
spaces are parameterized. The parameters are used to optimize the size of 
neighborhoods in these spaces as well as the inclusion relation [28.26]. A 
granule approximation quality is taken as the optimization criterion. Ap- 
proximation spaces attached to any operation of ag correspond to neuron 
weights in neural networks whereas the operation performed by the agent ag 
on information granules corresponds to the operation realized on vectors of 
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real numbers by the neuron. The generalized scheme of agents is returning 
a granule in response to input information granules. It can be for example a 
cluster of elementary granules. Hence, our schemes realize much more general 
computations than neural networks operating on vectors of real numbers. 

We call extended schemes for complex object construction rough neural 
networks (for complex object construction). The problem of deriving such 
schemes is closely related to perception (see, e.g., [28.1], [28.40]). The stability 
of such networks corresponds to the resistance to noise of classical neural 
networks. 

Let us observe that in our approach the deductive systems are substituted 
by productions systems of agents linked by approximation spaces, communi- 
cation strategies and mechanism of derivation of Hi?-schemes. This revision of 
classical logical notions seems to be important for solving complex problems 
in distributed environments. 



28.4 Decomposition of Information Granules 

Information granule decomposition methods are important components of 
methods for inducing of 4i?-schemes from data and background knowledge. 
Such methods are used to extract from data, local decomposition schemes 
called produtions [28.25]. The Hi?-schemes are constructed by means of pro- 
ductions. The decomposition methods are based on searching for the parts 
of information granules that can be used to construct relevant higher level 
patterns matching up to a satisfactory degree the target granule. 

One can distinguish two kinds of parts (represented, e.g., by sub-formulas 
or sub-terms) of Hi?-schemes. Parts of the first type are represented by ex- 
pressions from a language, called the domestic language Ld, that has known 
semantics (consider, for example, semantics defined in a given information 
system [28.15]). Parts of the second type of Hi?-scheme are from a langu- 
age, called foreign language Lf (e.g., natural language), that has semantics 
definable only in an approximate way (e.g., by means of patterns extracted 
using rough, fuzzy, rough-fuzzy or other approaches). For example, the parts 
of the second kind of scheme can be interpreted as soft properties of sensor 
measurements [28.3]. 

For a given expression e, representing a given scheme that consists of sub- 
expressions from L f first it is necessary to search for relevant approximations 
in Ld of the foreign parts from L f and next to derive global patterns from the 
whole expression after replacing the foreign parts by their approximations. 
This can be a multilevel process, i.e., we are facing problems of discovered 
pattern propagation through several domestic-foreign layers. 

Productions from which Hi?-schemes are built can be induced from data 
and background knowledge by pattern extraction strategies. Let us consider 
some of such strategies. The first one makes it possible to search for relevant 
approximations of parts using the rough set approach. This means that each 
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part from Lf can be replaced by its lower or upper approximation with 
respect to a set B of attributes. The approximation is constructed on the 
basis of relevant data table [28.15], [28.9]. With the second strategy parts 
from Lf are partitioned into a number of sub-parts corresponding to cuts 
(or the set theoretical differences between cuts) of fuzzy sets representing 
vague concepts and each sub-part is approximated by means of rough set 
methods. The third strategy is based on searching for patterns sufficiently 
included in foreign parts. In all cases, the extracted approximations replace 
foreign parts in the scheme and candidates for global patterns are derived 
from the scheme obtained after the replacement. Searching for relevant global 
patterns is a complex task because many parameters should be tuned, e.g., 
the set of relevant features used in approximation, relevant approximation 
operators, the number and distribution of objects from the universe of objects 
among different cuts and so on. One can use evolutionary techniques [28.10] 
in searching for (semi-) optimal patterns in the decomposition. 

It has been shown that the decomposition strategies can be based on the 
developed rough set methods for decision rules generation and Boolean reaso- 
ning [28.21], [28.12], [28.17], [28.33]. In particular, methods for decomposition 
based on background knowledge can be developed [28.30], [28.18]. 

Conclusions. We have discussed a methodology for synthesis of 7li?-schemes 
and rough neural networks. For more details the reader is referred to [28.21], 
[28.22], [28.23], [28.24], [28.26], [28.27], [28.32], [28.33], [28.34]. 

We enclose a list of research directions related to the synthesis and analysis 
of yli?-schemes and rough neural networks. 

1. Developing foundations for information granule systems. Certainly, still 
more work is needed to develop solid foundations for synthesis and ana- 
lysis of information granule systems. In particular, methods for construc- 
tion of hierarchical information granule systems, and methods for repre- 
sentation of such systems should be developed. 

2. Algorithmic methods for inducing parameterized productions. Some me- 
thods have already been reported such as discovery of rough mereological 
connectives from data (see, e.g., [28.21]) or methods based on decompo- 
sition (see, e.g., [28.22], [28.30], [28.34], [28.17]). However, these are only 
initial steps toward algorithmic methods for inducing of parameterized 
productions from data. One interesting problem is to determine how such 
productions can be extracted from data and background knowledge. A 
method in this direction has been proposed in [28.3]. 

3. Algorithmic methods for synthesis of AR- schemes. It was observed (see, 
e.g., [28.22], [28.27]) that problems of negotiations and conflict resoluti- 
ons are of great importance for synthesis of Ai?-schemes. The problem 
arises, e.g., when we are searching in a given set of agents for a granule 
sufficiently included or close to a given one. These agents, often working 
with different systems of information granules, can derive different gra- 
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nules and their fusion will be necessary to obtain the relevant output 
granule. In the fusion process, the negotiations and conflict resolutions 
are necessary. Much more work should be done in this direction by using 
the existing results on negotiations and conflict resolution. In particular, 
Boolean reasoning methods seem to be promising ([28.22]) for solving 
such problems. Another problem is related to the size of production sets. 
These sets can be of large size and it is important to develop learning 
methods for extracting small candidate production sets in the process of 
extension of temporary derivations out of huge production sets. For sol- 
ving this kind o problems methods for clustering of productions should 
be developed to reduce the size of production sets. Moreover, dialog and 
cooperation strategies between agents can help to reduce the search space 
in the process of Ai?-scheme construction from productions. 

4. Algorithmic methods for learning in rough neural networks. A basic pro- 
blem in rough neural networks is related to selecting relevant approxima- 
tion spaces and to parameter tuning. One can also look up to what extent 
the existing methods for classical neural methods can be used for learning 
in rough neural networks. However, it seems that new approach and me- 
thods for learning of rough neural networks should be developed to deal 
with real-life applications. In particular, it is due to the fact that high 
quality approximations of concepts can be often obtained only through 
dialog and negotiations processes among agents in which gradually the 
concept approximation is constructed. Hence, for rough neural networks 
learning methods based on dialog, negotiations and conflict resolutions 
should be developed. In some cases, one can use directly rough set and 
Boolean reasoning methods (see, e.g., [28.33]). However, more advanced 
cases need new methods. In particular, hybrid methods based on rough 
and fuzzy approaches can bring new results [28.13]. 

5. Fusion methods in rough neural neurons. A basic problem in rough neu- 
rons is fusion of the inputs (information) derived from information granu- 
les. This fusion makes it possible to contribute to the construction of new 
granules. In the case where the granule constructed by a rough neuron 
consists of characteristic signal values made by relevant sensors, a step in 
the direction of solving the fusion problem can be found in [28.19], [28.6]. 
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Relevance of fuzzy logic, artificial neural networks, genetic algorithms and 
rough sets to pattern recognition and image processing problems is descri- 
bed through examples. Different integrations of these soft computing tools 
are illustrated. Evolutionary rough fuzzy network which is based on modu- 
lar principle is explained, as an example of integrating all the four tools for 
efficient classification and rule generation, with its various characterstics. Sig- 
nificance of soft computing approach in data mining and knowledge discovery 
is finally discussed along with the scope of future research. 



29.1 Introduction 

Soft computing is a consortium of methodologies which work synergestically 
and provides in one form or another flexible information processing capabili- 
ties for handling real life ambiguous situations. Its aim is to exploit the tole- 
rance for imprecision, uncertainty, approximate reasoning and partial truth 
in order to achieve tractahility , robustness, low cost solutions, and close re- 
semblance to human like decision making. In other words, it provides the 
foundation for the conception and design of high MIQ (Machine IQ) sy- 
stems, and therefore forms the basis of future generation computing systems. 
At this juncture, Fuzzy Logic (FL), Rough Sets (RS), Artificial Neural Net- 
works (ANN) and Genetic Algorithms (GA) are the principal components 
where FL provides algorithms for dealing with imprecision and uncertainty 
arising from vagueness rather than randomness, RS for handling uncertainty 
arising from limited discernibility of objects, ANN the machinery for learning 
and adaptation, and GA for optimization and searching [29.1, 29.2]. 

Machine recognition of patterns [29.3, 29.4] can be viewed as a two-fold 
task, consisting of learning the invariant and common properties of a set 
of samples characterizing a class, and of deciding that a new sample is a 
possible member of the class by noting that it has properties common to 
those of the set of samples. Therefore, the task of pattern recognition by a 
computer can be described as a transformation from the measurement space 
M to the feature space F and finally to the decision space D. Depending on 
the type of input patterns, one may have speech recognition system, image 
recognition or vision system, medical diagnostic system etc. 
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In this article we first describe the relevance of different soft computing 
tools to pattern recognition problems with examples. Different integration 
among them are then described. As an example we explain an evolutionary 
rough fuzzy MLP, which has been designed based on modular concept for 
pattern classification and rule generation. Finally the significance of soft com- 
puting in data mining and knowledge discovery is discussed. 



29.2 Relevance of Fuzzy Set Theory in Pattern 
Recognition 

Fuzzy sets were introduced in 1965 by Zadeh [29.5] as a new way to represent 
vagueness in everyday life. They are generalizations of conventional (crisp) 
set theory. Conventional sets contain objects that satisfy precise properties 
required for membership. Fuzzy sets, on the other hand, contain objects that 
satisfy imprecisely defined properties to varying degrees. A fuzzy set A of the 
universe X is defined as a collection of ordered pairs 

A = Va; € X} 

where ha{x), (0 < fiA{x) < 1) gives the degree of belonging of the element 
X to the set A or the degree of possession of an imprecise property represented 
by A. Different aspects of fuzzy set theory including membership functions, 
basic operations and uncertainty measures can be found in [29.5, 29.6]. 

In this section we explain some of the uncertainties which one often en- 
counters while designing a pattern recognition system and the relevance of 
fuzzy set theory in handling them. Let us consider, first of all, the case of 
processing and recognition of a gray-tone image pattern. Conventional appro- 
aches to image analysis and recognition [29.7, 29.8] consist of segmenting the 
image into meaningful regions, extracting their edges and skeletons, compu- 
ting various features (e.g., area, perimeter, centroid etc.) and primitives (e.g., 
line, corner, curve etc.) of and relationships among the regions, and finally, 
developing decision rules and grammars for describing, interpreting and/or 
classifying the image and its sub-regions. In a conventional system each of 
these operations involves crisp decisions (i.e., yes or no, black or white, 0 or 1) 
to make regions, features, primitives, properties, relations and interpretations 
crisp. 

Since the regions in an image are not always crisply defined, uncertainty 
can arise within every phase of the aforesaid tasks. Any decision made at a 
particular level will have an impact on all higher level activities. An image 
recognition system should have sufficient provision for representing and mani- 
pulating the uncertainties involved at every processing stage; i.e., in defining 
image regions, features and relations among them, so that the system retains 
as much of the ‘information content’ of the data as possible. If this is done, 
the ultimate output (result) of the system will possess minimal uncertainty 
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(and unlike conventional systems, it may not be biased or affected as much 
by lower level decision components). 

In Short, gray information is expensive and informative. Once it is thrown 
away, there is no way to get it back. Therefore one should try to retain this 
information as long as possible throughout the decision making tasks for its 
full use. When it is required to make a crisp decision at the highest level one 
can always through away or ignore this information. 

Let us now consider the case of a decision-theoretic approach to pattern 
classification. With the conventional probabilistic and deterministic classifiers 
[29.3, 29.4], the features characterizing the input patterns are considered to 
be quantitative (numeric) in nature. The patterns having imprecise or incom- 
plete information are usually ignored or discarded from their designing and 
testing processes. The impreciseness (or ambiguity) may arise from various 
causes. For example, instrumental error or noise corruption in the experiment 
may lead to only partial or partially reliable information being available on 
a feature measurement F. Again, in some cases it may become convenient 
to use linguistic variables and hedges. In such cases, it is not appropriate to 
give exact representation to uncertain feature data. Rather, it is reasonable 
to represent uncertain feature information by fuzzy subsets. 

Again, uncertainty in classification or clustering of patterns may arise 
from the overlapping nature of the various classes. This overlapping may re- 
sult from fuzziness or randomness. In the conventional technique, it is usually 
assumed that a pattern may belong to only one class, which is not necessarily 
true in real life applications. A pattern can and should be allowed to have 
degrees of membership in more than one class. It is, therefore, necessary to 
convey this information while classifying a pattern or clustering a data set. 

From the aforementioned examples, we see that the concept of fuzzy sets 
can be used at the feature level in representing input data as an array of 
membership values denoting the degree of possession of certain properties, 
in representing linguistically phrased input features for their processing, in 
weakening the strong commitments for extracting ill-defined image regions, 
properties, primitives, and relations among them, and at the classification 
level, for representing class membership of objects in terms of membership 
values. In other words, fuzzy set theory provides a notion of embedding: We 
find a better solution to a crisp problem by looking in a large space at first, 
which has different (usually less) constraints and therefore allows the algo- 
rithm more freedom to avoid errors forced by commission to hard answers in 
intermediate stages. 

The capability of fuzzy set theory in pattern recognition problems has 
been reported adequately since late sixties. A cross-section of the advances 
with applications is available in [29.6, 29.2, 29.9]. 
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29.3 Relevance of Neural Network Approaches 

Neural network (NN) models [29.10, 29.11] try to emulate the biological neu- 
ral network/nervous system with electronic circuitry. NN models have been 
studied for many years with the hope of achieving human-like performance 
(artificially), particularly in the field of pattern recognition, by capturing the 
key ingredients responsible for the remarkable capabilities of the human ner- 
vous system. Note that these models are extreme simplifications of the actual 
human nervous system. 

NNs are designated by the network topology, connection strength between 
pairs of neurons (called weights), node characteristics and the status updating 
rules. Node characteristics mainly specify the primitive types of operations 
it can perform, like summing the weighted inputs coming to it and then am- 
plifying it or doing some fuzzy aggregation operations. The updating rules 
may be for weights and/or states of the processing elements (neurons). Nor- 
mally an objective function is defined which represents the complete status of 
the network and the set of minima of it corresponds to the set of stable states 
of the network. Since there are interactions among the neurons the collective 
computational property inherently reduces the computational task and ma- 
kes the system fault tolerant. Thus NN models are also suitable for tasks 
where collective decision making is required. Hardware implementations of 
neural networks are also attempted. 

Neural network based systems are usually reputed to enjoy the following 
major characteristics: 

— adaptivity- adjusting the connection strengths to new data/information, 

— speed- due to massively parallel architecture, 

— robustness- to missing, confusing, ill-defined/noisy data, 

— ruggedness- to failure of components, 

— optimality- as regards error rates in performance. 

For any pattern recognition system, one desires to achieve the above mentio- 
ned characteristics. More over, there exists some direct analogy between the 
working principles of many pattern recognition tasks and neural network mo- 
dels. For example, image processing and analysis in the spatial domain mainly 
employ simple arithmetic operations at each pixel site in parallel. These ope- 
rations usually involve information of neighboring pixels (co-operative proces- 
sing) in order to reduce the local ambiguity and to attain global consistency. 
An objective measure is required (representing the overall status of the sy- 
stem), the optimum of which represents the desired goal. The system thus 
involves collective decisions. On the other hand, we notice that neural net- 
work models are also based on parallel and distributed working principles 
(all neurons work in parallel and independently). The operations performed 
at each processor site are also simpler and independent of the others. The 
overall status of a neural network can also be measured. 
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Again, the task of recognition in a real-life problem involves searching a 
complex decision space. This becomes more complicated particularly when 
there is no prior information on class distribution. Neural network based sy- 
stems use adaptive learning procedures, learn from examples and attempt 
to find a useful relation between input and output, however complex it may 
be, for decision-making problems. Neural networks are also reputed to model 
complex non-linear boundaries and to discover important underlying regu- 
larities in the task domain. These characteristics demand that methods are 
needed for constructing and refining neural network models for various re- 
cognition tasks. In short, neural networks are natural classifiers having resi- 
stance to noise, tolerance to distorted images /patterns (ability to generalize), 
superior ability to recognize partially occluded or degraded images/overlapping 
pattern classes or classes with highly nonlinear boundaries, and potential for 
parallel processing. 



29.4 Genetic Algorithms for Pattern Recognition 

Genetic Algorithms (GAs) [29.12, 29.13, 29.14, 29.15] are adaptive compu- 
tational procedures modeled on the mechanics of natural genetic systems. 
They express their ability by efficiently exploiting the historical information 
to speculate on new offspring with expected improved performance [29.12]. 
GAs are executed iteratively on a set of coded solutions, called population, 
with three basic operators: selection/reproduction, crossover and mutation. 
They use only the payoff (objective function) information and probabilistic 
transition rules for moving to the next iteration. They are different from most 
of the normal optimization and search procedures in four ways: 

— GAs work with the coding of the parameter set, not with the parameter 
themselves. 

— GAs work simultaneously with multiple points, and not a single point. 

— GAs search via sampling (a blind search) using only the payoff information. 

— GAs search using stochastic operators, not deterministic rules. 

One may note that the methods developed for pattern recognition and 
image processing are usually problem dependent. Moreover, many tasks in- 
volved in the process of analyzing/identifying a pattern need appropriate 
parameter selection and efficient search in complex spaces in order to obtain 
optimal solutions. This makes the process not only computationally inten- 
sive, but also leads to a possibility of losing the exact solution. Therefore, the 
application of genetic algorithms for solving certain problems of pattern reco- 
gnition, which need optimization of computation requirements, and robust, 
fast and close approximate solution, appears to be appropriate and natural 
[29.13]. 
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29.5 Integration and Hybrid Systems 

Integration of the individual soft computing tools help in designing hybrid 
systems which are more versatile and efficient compared to stand alone use 
of the tools. The most visible integration in soft computing community is 
that of neural networks and fuzzy sets [29.2]. Neuro-fuzzy systems has been 
successfully developed for decision making, pattern recognition and image 
processing tasks. The hybridization falls in two major categories: a neural 
network equipped with the capability of handling fuzzy information (termed 
fuzzy neural network) to augment its application domain, and a fuzzy system 
augmented by neural networks to enhance some of its characterstics like fle- 
xibility, speed, adaptivility, learning (termed neural- fuzzy systems) . Both the 
classes of hybridisation and their application to various pattern recognition 
problem are described in [29.2]. 

There are some applications where the integration of GAs with fuzzy sets 
and ANNs is found to be effective. For example GAs are found sometimes 
essential for overcoming some of the limitations of fuzzy set theory, specifi- 
cally to reduce the ‘subjective’ nature of membership functions. Note that 
the other way of integration, i.e., incorporating the concept of fuzziness into 
GAs has not been tried seriously. Synthesis of ANN architectures can be done 
using GAs as an example of neuro-genetic systems. Such an integration may 
help in designing optimum ANN architecture with appropiate parameter sets. 
Methods for designing neural network architectures using GAs are primarily 
divided into two parts. In one part the GA replaces the learning method to 
find appropiate connection weights of some predefined architecture. In ano- 
ther part, GAs are used to find the architecture itself and it is then evaluated 
using some learning algorithms. Literature is also available on integration of 
fuzzy sets, neural networks and genetic algorithms [29.2, 29.16, 29.17]. 

The theory of rough sets [29.18] has emerged as another major mathema- 
tical approach for managing uncertainty that arises from inexact, noisy, or 
incomplete information. It is turning out to be methodologically significant 
to the domains of artificial intelligence and cognitive sciences, especially in 
the representation of and reasoning with vague and/or imprecise knowledge, 
data classification, data analysis, machine learning, and knowledge discovery 
[29.19]. 

Recently, rough sets have been integrated with both fuzzy sets and neu- 
ral networks. Several rough-fuzzy hybrid systems are discussed in [29.2]. In 
the framework of rough-neuro integration [29.20], two broad approaches are 
available, namely, use of roughs set for encoding weights of knowledge based 
networks [29.21], and designing neural network architectures which incorpo- 
rate roughness in the neuronal level. Genetic algorithms have also been used 
for fast generation of rough set reducts from an indiscernibility matrix. 

In the next section we describe, as an example, a methodology for inte- 
grating all the four soft computing tools, viz., fuzzy sets, ANN, rough sets 
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and GAs for classification and rule generation. Here rough sets are used to 
encode domain knowledge in network parameters of a fuzzy MLP. GAs are 
used to evolve the optimal architecture based on modular concept. 



29.6 Evolutionary Rough Fuzzy MLP 

The evolutionary rough fuzzy MLP utilises the concept of modular learning 
for better integration and performance enhancement [29.22]. The knowledge 
flow structure of evolutionary rough fuzzy MLP is illustrated in Figure 29.1. 
Here each of the soft computing tools act synergestically to contribute to 
the final performance of the system as follows. Rough set rules are used for 
extracting crude domain knowledge, which when encoded in a fuzzy MLP 
not only results in fast training of the network, but also automatic deter- 
mination of the network size. The GA operators are adaptive and use the 
domain knowledge extracted with rough sets for even faster learning. The 
fuzziness incorporated at the input and outputs helps in better handling of 
uncertainties and overlapping classes. The nature of integration is illustrated 
in Figure 29.2. 
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Fig. 29.1. Knowledge Flow in Modular Rough Fuzzy MLP 
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Incorporate Domain Knowledge Using Rough Sets 
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Fig. 29.2. Components of the Modular Rough-fuzzy MLP 



The evolutionary modular rough fuzzy MLP has been applied to a number 
of real world problems like speech recognition and medical diagnosis. In case 
of speech recognition [29.22], the system is found to correctly classify 84% 
of the samples, while the fuzzy MLP correctly classifies only 78% and the 
MLP only 59%. The system also gained in computation time significantly. 
For determining the stages of Cervical Cancer [29.22], the system provides 
results identical to that of medical experts in 83% of the cases. In other cases 
also the stagings were close. In addition to the above performance logical rules 
were extracted from the trained system. It was found that the rules coincided 
with the guidelines adopted by medical practicioners for staging. In the rough 
fuzzy MLP, the final network has a structure imposed on the weights. Hence, 
crisp logical rules can be easily extracted from the networks. This makes the 
system suitable for Knowledge Discovery in Databases. The rules obtained 
are found to be superior to those of several popular methods, as measured 
with some quantitative indices. For example, on the speech recognition data, 
the rules obtained using the modular rough-fuzzy MLP have an accuracy of 
81.02% with 10 rules, while the popular C4.5 rule generation algorithm have 
accuracy of 75.00% using 16 rules. Fraction of samples which are ‘uncovered’ 
by the rules obtained by us is only 3.10%, whereas the C4.5 rules have 7.29% 
uncovered samples. The ‘confusion index’ is also low for the proposed method 
(1.4) compared to C4.5 (2.0). 



29.7 Data Mining and Knowledge Discovery 

In recent years, the rapid advances being made in computer technology have 
ensured that large sections of the world population have been able to gain 
easy access to computers on account of falling costs worldwide, and their 
use is now commonplace in all walks of life. Government agencies, scientific, 
business and commercial organizations are routinely using computers not just 
for computational purposes but also for storage, in massive databases, of the 
immense volumes of data that they routinely generate, or require from other 
sources. Large-scale computer networking has ensured that such data has 
become accessible to more and more people. In other words, we are in the 
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Data Mining (DM) 





Fig. 29.3. Block diagram for Knowledge Discovery in Databases (KDD) 



midst of an information explosion, and there is urgent need for methodologies 
that will help us bring some semblance of order into the phenomenal volumes 
of data that can readily be accessed by us with a few clicks of the keys of our 
computer keyboard. Traditional statistical data summarization and database 
management techniques are just not adequate for handling data on this scale, 
and for extracting intelligently, information or, rather, knowledge that may 
be useful for exploring the domain in question or the phenomena responsible 
for the data, and providing support to decision-making processes. This quest 
had thrown up some new phrases, for example, data mining and knowledge 
discovery in databases (KDD), which are perhaps self-explanatory, but will 
be briefly discussed in the next few paragraphs. Their relationship with the 
discipline of pattern recognition will also be examined. 

The massive databases that we are talking about are generally characteri- 
zed by the presence of not just numeric, but also textual, symbolic, pictorial 
and aural data. They may contain redundancy, errors, imprecision, and so on. 
KDD is aimed at discovering natural structures within such massive and often 
heterogeneous data. Therefore PR plays a significant role in KDD process. 
However, KDD is being visualized as not just being capable of knowledge 
discovery using generalizations and magnifications of existing and new pat- 
tern recognition algorithms, but also the adaptation of these algorithms to 
enable them to process such data, the storage and accessing of the data, its 
preprocessing and cleaning, interpretation, visualization and application of 
the results, and the modeling and support of the overall human-machine in- 
teraction. What really makes KDD feasible today and in the future is the 
rapidly falling cost of computation, and the simultaneous increase in com- 
putational power, which together make possible the routine implementation 
of sophisticated, robust and efficient methodologies hitherto thought to be 
too computation-intensive to be useful. A block diagram of KDD is given in 
Figure 29.3. 

Data mining is that part of knowledge discovery which deals with the 
process of identifying valid, novel, potentially useful, and ultimately under- 
standable patterns in data, and excludes the knowledge interpretation part 
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of KDD. Therefore, as it stands now, data mining can be viewed as applying 
PR and machine learning principles in the context of voluminous, possibly 
heterogeneous data sets. Furthermore, soft computing-based (involving fuzzy 
sets, neural networks, genetic algorithms and rough sets) PR methodologies 
and machine learning techniques seem to hold great promise for data mining. 
The motivation for this is provided by their ability to handle imprecision, 
vagueness, uncertainty, approximate reasoning and partial truth and lead 
to tractability, robustness and low-cost solutions. In this context, case-based 
reasoning [29.17], which is a novel Artificial Intelligence (AI) problem-solving 
paradigm, has a significant role to play, as is evident from the recent book 
edited by Pal, Dillon and Yeung [29.17]. 

Some of the challenges that researchers in this area are likely to deal with, 
include those posed by massive data sets and high dimensionality, nonstan- 
dard and incomplete data, and overfitting. The focus is most likely to be on 
aspects like user interaction, use of prior knowledge, assessment of statisti- 
cal significance, learning from mixed media data, management of changing 
data and knowledge, integration of tools, ways of making knowledge disco- 
very more understandable to humans by using rules, visualization, etc., and 
so on. We believe the next decade will bear testimony to this. 
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30.1 Concepts of Upper and Lower Possibility Distributions 

Knowledge from one expert can be represented by a data set {{x.,h.)\i = l,...,m} 
where x, = [x ^ , • • • , ]' is an n-dimensional vector to characterize some specified 
event, /i, is an associated possibility grade given by an expert to reflect his 
judgement on what the possibility grade of the ith sample is for this event, and m 
is the number of samples. The data set (x, can be approximated by 

a dual data sets ( x . , h ,. ) and ( x^ , h ^. ) (/=!,.. .,m) with the condition < h. < h^. . 
Assume that the values /z,, and h^- are from a class of the functions G{x,0) with 
parameter vector 9 . Let G(x.,Q,) and G(x,,0^,) correspond to h,. and h^. 

respectively and simply denote as ;T;(x,)and ;r„(x,). Given the data 
set (Xj. ,/Zj) (/=!,.. .,m), the objective of estimation is to obtain two optimal pa- 
rameter vectors 6^ and 6 1 from the parameter space to approximate (x- ,h^) 
from upper and lower directions according to some given measure. Moreover, the 
dual optimal parameter vectors ( 9 * , 9j ) make the relation G(x, 9 ^ ) < G(x, ^ * ) 
hold for any arbitrary n-dimensonal vector x. 

Suppose that the function G(x,9) is an exponential function 

exp{-(x-a)'Djj'(x-a)}, simply denoted as (a, D^)^. Then the following for- 
mulas hold. 

;r,(x,.) = exp{-(x,- -a)'D7'(x,- -a)} U) 

;r„(x,) = exp{-(x, -a)'D,;'(x,. -a)} .,m, (2) 

7Ti {x.)< h. < (x,. ) and (x) < (x) , i= 1 , . . . ,m, (3) 
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where a = [a[,a 2 ,---,a„]^ is a center vector, and D, are positive definite ma- 
trices, denoted as D„ > 0 and D, > 0 , respectively. It can be seen that in the 
above exponential function, vector a and matrices and D, are parameters to 
be solved. Different parameters a, and D, lead to different values 7Ti(x-) 
and 7r^(x.) which approximate the given possibility degree h. of x. to the dif- 
ferent extent. 

Definition 1. Given the formulas (1), (2) and (3), the fitness of approximation 
based on parameters a , and D, , denoted as ,is defined as follows: 






'fr ^/(x,-) 



(4) 



It is known from Definition 1 that the higher the parameter fi is, the closer to 
h- values 7Ti(x-) and 7T^(x-) are from lower and upper directions, respectively. 

m m 

(x, ) and ]^.^„(x, ) can be regarded as likelihood functions for lower and 

i=l 1=1 

upper possibility distributions. 

Definition 2. Denote the optimal solutions of a , D^ and D, as a, , D,^ and 
D,, , respectively, which maximize Ji with the constraint (3). The following 
functions 



7l:,^ (x) = exp {-(x - a , ) ' D,/ (x - a , ) } 
;t-.„(x) = exp{-(x-a.)'D;„‘(x-a.)} 



are called lower and upper exponential possibility distributions of the possibility 
vector X , respectively. For simplicity afterwards we write (x) and tT/ (x) in- 
stead of (x) and k,, (x) . 



30.2 Comparison of Dual Possibility Distributions with Dual 
Approximations in Rough Sets Theory 

Rough sets theory has been proposed by Pawlak and extensively applied to classi- 
fication problems, machine learning, and decision analysis etc. [1, 2]. For com- 
paring the dual possibility distributions with the rough sets, the basic notions of 
rough sets are introduced below. 

Let t/be the universe of objects and R be an equivalence relation in U. Then by 
U / R we mean the family of all equivalence class of R. Equivalence classes of the 
relation R are called elementary sets. Any finite union of elementary sets is said to 
be a definable set. Given a set Z, the upper and lower approximations of Z, de- 
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noted as R*(Z) and R,(Z) , respectively are two definable sets defined as fol- 
lows: 

R\Z) = \J{YeU/R:Yf]Z^0}, (7) 

R,(Z) = \J{YeU/R:Y^Z} (8) 

where 0 is the empty set. 

It can be seen that the upper approximation of Z is defined as the least definable 
set containing the set Z and the lower approximation of Z is defined as the greatest 

definable set contained in Z so that the condition R (Z) □ R*(Z) holds. An accu- 
racy measure of a set Z, denoted as a{Z ) , is defined as 

Card{R.{Z)) (9) 

a{Z) = 

Card{R {Z)) 



where Card{R,{Z)) and Card(R* (Z)) are the cardinalities of/?, (Z) and 
R'{Z). 



30.3 Identification of Upper and Lower Possibility Distributions 



The upper and lower approximations of Z can be regarded as the optimal solutions 
of the following optimization problem. 



max 

RJZ),R,(Z) 

s. t. 



Card(RiiZ)) 

aiZ) = 

Card(R^(Z)) 

/?,(Z)cZc/?„(Z), 



( 10 ) 



where R, (Z) and R^ (Z) are definable sets by U I R. Similarly the model to 
identify the upper and lower possibility distributions can be formulated to maxi- 
mize the fitness measure as follows: 



max 

a,D,,D, 

S. t. 



p=it[ 



n, (x,) 
(x,) 



( 11 ) 



(X) > 7t, (X) 

The corresponding relations between dual approximations and dual possibility 
distributions are listed in Table 1. 




30. Identifying Upper and Lower Possibility Distributions 



275 



Table 1. The similarities between rough set and possibility distributions 



Possibility distributions 
Upper distribution: (x) 

Lower distribution: Tti (x) 

m 

Product of ;r„(x,): ) 

1=1 



Product of n, (x - ) 

1=1 

Inequality relation: (x) > ;r, (x) 

Measure of fitness: 









Rough sets 

Upper approximation: R (Z) 

Lower approximation: R,(Z) 

Cardinality of R * (Z) : Card(R * (Z)) 

Cardinality of R,(Z) : Card(R,(Z)) 

Inclusion relation: R (Z) □ R,{Z) 

Accuracy measure of a set Z: 

Card(R,{Z)) 

a(Z) = 

Card{R (Z)) 



It is straightforward that the objective function and constraints of (11) corre- 
spond to the objective function and constraints of (10), respectively. With consid- 



ering that maximizing 



n 



(x,-) 



=1 ^„(x,) 



equivalent to maximizing 



InWR 






7 =f^u(x,-) 1=1 

can be rewritten as follows: 



= (^ (In TU, (x J - In (x, )))! m , the optimization problem (11) 



min 

a 



£(x, -a)'D7‘(x, -a)-£(x, -a)'D;‘(x,. 



i=l 



i=l 



s. t. (x,- -a)'D,*(x,. -a) >-ln/i,. , 

(x. -a)'D“*(x^. -a) <-lnh. , 
D„-D, >0, 

D/>0. 



-a) 



( 12 ) 



It should be noted that the optimization problem (12) is equivalent to the inte- 
grated model proposed in the paper [3,4] in form. However, they arise from very 
different consideration. The latter was used to integrate two optimization problems 
to obtain upper and lower possibility distributions simultaneously. The former is 
used to seek an optimal center vector a and positive definite matrices and D, 
to maximize fitness measure Jd defined in formula (4). Model (11) makes it quite 

clear that upper and lower possibility distributions have very similar structure to 
the upper and lower approximations in rough sets theory. In the following, let us 
consider how to obtain center vector a and positive matrices D, and . 
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Center vector a can be approximately estimated as 

a = X , , 

I 



(13) 



where x .. denotes the vector whose grade is h, = max hf, . The associated pos- 

^ * k=\,...,m 

sibility grade of x.. is revised to be 1 because it becomes the center vector. Tak- 
ing the transformation y =x-a, the problem (12) is changed into the following 
problem. 



min 

D„,D/ 



£yJD7‘y,- 



-£yJD;‘y. 



s-t. y'D,‘y,. >-ln/z,., (=l,...,m, 
y!D«‘y, <-lnh,, i=l,...,m, 



(14) 



D„-D, >0, 

D,>0 

The formula (14) is a nonlinear optimization problem due to the last two con- 
straints. To cope with this difficulty, we use principle component analysis (PCA) 
to rotate the given data (y,, h,) to obtain a positive definite matrix easily. The 
data y, (/=!,..., m) can be transformed by a linear transformation matrix T whose 
columns are eigenvectors of the matrix Z = [a^j ] , where is defined as 



m m (15) 

- adixkj - a j)hj, 
k=\ k=\ 



Using the linear transformation matrix T, the data y, is transformed into 
{x- = T'y, } . Then formulas (1) and (2) can be rewritten as follows: 

TU, (z.) = exp{-z /T'D^'TZ j. } , (1^) 

;r„(z,.) = exp{-z/T'D^'Tz_.} , (1”^) 



Since T is obtained by PCA, T'D^'T and T'D/T can be assumed to be diago- 
nal matrices as follows: 



C„=T'D:‘T: 



( 18 ) 



V 



7 
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C, =T'D7‘T: 



Model (14) can be rewritten as the following LP problem: 



(19) 



min 

C;,C„ 





s. t. x'-C^z- < -Inh - , 
z'C;Z; > -Inh-, 

Cij 

7=1, -,M, 



( 20 ) 



where the condition Cy > c^j ^ e>0 makes the matrix - D, semi-positive 
definite and matrices D„ and D, positive. Denote the optimal solutions of (20) as 
C* and C", . Thus, we have 



D:=Tcrx', 



o;=Tcr T'. 



( 21 ) 



30.4 Conclusions 

In this paper, from upper and lower directions the upper and lower possibility dis- 
tributions are identified to approximate the given possibility grades, which is re- 
garded as the expert’s knowledge. The upper possibility distribution reflects the 
optimistic viewpoint of the expert and the lower possibility distribution reflects 
pessimistic one. The similarities between dual possibility distributions and upper 
and lower approximations in rough sets theory are investigated. It is obvious that 
they have homogenous structures. 
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We introduce the notion of a fractal in an information system and we define 
a dimension function of a fractal in an information system parallel to the 
Minkowski dimension in Euclidean spaces. We prove basic properties of this 
new dimension. 



31.1 Introduction 

Objects called now ’’fractals” have been investigated since 1920’s (cf. [31.3], 
[31.6]) yet the renewed interest in them goes back to 1970’s in connection with 
studies of chaotic behavior, irregular non-smooth sets, dynamic systems, in- 
formation compression and computer graphics (cf. [31.9]). The basic characte- 
ristics of ’’fractals” are rooted in dimension theory. The topological dimension 
theory assigns to any subset T of a (sufficiently regular) topological space X 
an integer dimT > —1 called the dimension of T (cf. [31.7]). This dimension 
function, however, does not capture peculiar features of fractals among them 
the periodicity of local structure and appearance of details at any scale; for 
this reason, fractals are evaluated by means of other functions e.g. Haus- 
dorff dimension or Minkowski (box) dimension better suited at capturing the 
peculiarities of local structure. 

Many fractal objects can be generated by means of iterations of affine 
mappings (iterated function systems (cf. [31.8]) hence they allow for kno- 
wledge compression algorithms (cf. [31.1]; cf. also [31.11] for a rough set 
counterpart of the fractal collage theorem). 

We are interested here in transferring the notion of a fractal to the general 
framework of rough set theory and we examine here some propositions for a 
counterpart of fractal dimension in this general framework. 



31.2 Fractal Dimensions 

For a set (for properties of fractal dimensions see [31.4], [31.5]) T C E", 
and s > 0,5 > 0, one lets Hg{T) = inf^-diam^{Qi), the infimum taken 
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over all families {Qi : i = 1,2, ...} of sets in if" such that (i) T C\J.Qi (ii) 
diam{Qi) < 6. Then the limit H^{T) = lims^o+HI{T) exists and it follows 
easily that there exists a unique s* with the property that H^{T) = oo for 
s < s* and H‘^(T) = 0 for s > s*. The real number s* is the Hausdorff 
dimension of the set T, denoted dim-niT). The Hausdorff dimension is too 
closely related to the metric structure of the underlying space to admit any 
substantial abstraction. For our purposes, the other function, the Minkowski 
dimension seems to be better suited. This dimension has an information 
theoretic content and may be transferred-with changes relaxing its geometric 
content-into a universe of a general information system. 

For a bounded set T C if” (i.e. diam{T) < oo), and <5 > 0, we denote by 
ns (T) the least number of n-cubes of diameter less than S that cover T. Then 
we may consider the fraction and evaluate its limit. When the limit 

lims^o+ exists, it is called the Minkowski dimension of the set T and 

it is denoted diruM {T) ■ One may interpret this dimension as an information 
content (cf.[31.2]) of T: the shortest description of T over an alphabet of 
(5-cubes has length of order of dimM{T). 

Both dimensions agree on ’’standard” fractal objects like the Cantor set 
(cf. [31.4], [31.5]) in general they disagree. 

An advantage of the Minkowski dimension is that families of 5-cubes in its 
dimension may be selected in many ways, one among them is to consider a 6- 
grid of cubes of side length <5 on if" and to count the number Ns{T) of those 
among them which intersect T; then (cf. [31.4], [31.5]) if lims^o+ 
exists it is equal to dimM{T)- 



31.3 Rough Sets and Topologies on Rough Sets 

Rough sets arise in an attempt at formalization of the notion of uncertain 
knowledge (cf. [31.10], [31.13]). In this paradigm, knowledge base is an infor- 
mation system A = (U, A) where U is the set of objects described by means 
of attributes {features, properties) collected in the set A. For an object x G U 
and an attribute a € A we denote by the symbol a{x) the value of a on x. We 
admit here the case when the set U is infinite (e.g. if") and the set A consists 
of countably many attributes a„ where n = 1, 2, ... . Each attribute a„ induces 
on U the {a„}-indiscernibility relation Inda„ viz. xlnda„y a„{x) = an{y) 
which partitions U into classes [a:]a„; Vn is the resulting partition. We may 
assume that 7^„+i C Vn for each n. A subset {concept) Z C U is n-exact in 
case it is a union of a family of classes of Inda„ i.e. Z — U{[-2^]a„ : 2 € Z}. 
Otherwise, Z is said to be n-rough. Rough sets are approximated by exact 
sets : Zf = U{Na„ : Na„ C Z} and Z+ = lJ{Wa„ : [x]a„ C Z yf 0}. The 
set Zf is the lower an~ approximation of Z and the set Zif is the upper an~ 
approximation of Z. A topological interpretation of Z", Z+ as interior Int, 
resp. closure Cl of Z in topology Vn induced by the partition Vn suggests (cf. 
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[31.12]) a topology 77^ on the set U by taking as an open base for this topo- 
logy the family V = Un^n- ^ set Z is Ilj^-exact in case Intn^^Z = Cln^^Z 
otherwise it is IIj\-rough. In this way, we define a taxonomy of sets in U: they 
may be divided into three classes: sets which are 77„-exact, sets which are 
77^-exact and sets which are IIj\-rough (for a detailed study of topologies 
on rough sets see [31.12]). 

We now consider an information system Ac on the Euclidean space 7f"; 
this system consists of the universe U = 7f” and of attributes for k = 
1,2, .... defined via partitions Vk induced by relations Inda^- The partition 
Vk consists of n-cubes of the form 

" ■ ' 4-1 

(c) + 

where is an integer for each i = 1, 2, ..., n and 0 < < 2^ — 1 is an integer. 

From the definition of the Minkowski dimension we have 

Proposition 1 If the Minkowski dimension dirriM (T) exists then dirriM (T) = 
limk^oo ^ki^g2 "'^here is the number of cubes in Vk which do intersect T. 

Proposition 2 For any 77_4-exact set Z, we have diiriM (Z) = n. 

Proof. Indeed, if a set Z is TT^-exact then Z is a union of a family {Qj : j = 
1, 2, ...} of n-cubes of the form (c) and thus n > dimM{T) > dimM{Qi) = ^ 
by the monotonicity and stability of diniM (cf. [31.5] Sect. 3.2 and Thm.3.4). 

Corollary 1 Any set Z of fractional dimension dirriM is a 77_4-rough set. 

The last fact directs us towards general information systems and rough sets 
resulting in them. 



31.4 Fractals in Information Systems 

For an information system A = {U, A) with the countable set A = {a„ : 
n = 1,2,...} of attributes such that Inda„+i Q Inda„ for n = 1,2,..., we 
will define the notion of an M-dimension, denoted dim^. We will observe 
the information-theoretic content of the Minkowski dimension and thus- 
refraining from any geometric content, we introduce a normalization con- 
dition 

(TV) dim_A{Q) = 1 

for every equivalence class Q of any relation Inda„- The condition (TV) assures 
us that any equivalence class carries with itself a single bit of information, 
thus playing a role of an alphabet symbol. 
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We restrict ourselves to hounded subsets Z C U i.e. such Z which for each 
n are covered by a finite number of equivalence classes of Inda„- We may 
therefore assume that (i) the number of equivalence classes of Inda^ is ki (ii) 
each class of Inda„ ramifies into fc„+i classes of Inda„^i- 

We will say that the information system A is of type k = {ki)i. 

For a bounded set Z CJJ, we let dimj^{Z) = limn^oo l°g where k 

is the number of classes of Inda^ that intersect Z i.e. the number of classes 
in the upper approximation Z^ of Z. Then we have 

Proposition 3 In case A is of type k with kj > 2 for infinitely many j, 
dirrij^ does satisfy (N). 



Proof. Consider Q, a basic open set so that Q = [x\a^. We have 

Um ^°gnr=i k _ 7 ■ nr=fc+i 

logiyy^^ki iog n?=i “ 

li is the number of Inda^ classes that intersect Q. Thus dimj^{Q) = 1. 



Let us observe that - as with the Minkowski dimension- the A - dimen- 
sion may be ramified into two weaker notions viz. the upper A - dimen- 
sion dimj^ = limsupn^oo l°g t lower ^-dimension dim = 

limin f nr=i h 

iinuiij n^oo log ki ■ 

Basic properties of dimj^ parallel the respective properties of the Minko- 
wski dimension. 



Proposition 4 dimj, satisfies the following 

1. dimj\{Z) < dimj\{T) whenever Z CT 

2. dim_A,{Z U T) = max{dim_A{Z), diiT^AiT)} in case A is of type n with 
ki > 2 for infinitely many i 

3. dim_A,{Z) = dimA(Cln^Z) 

Proof. Indeed, (i) follows by the very definition of dim^. For (ii), by (i) it 
follows that dim^iZ U T) > max{dimj^{Z),di'<nj^{T)}. To prove the con- 
verse let us assume that dim_A,{Z) > dimj^{T) and split infinite sequences of 
natural numbers into two classes {pj denotes the number of classes of Inda^ 
intersecting Z and qj means the same for T): (I) a sequence {nj)j falls here 
in case for infinitely many j and for infinitely many j 

(II) a sequence falls here in case for almost every j (III) a sequence 

falls here in case for almost every j. 

We assume that Ij is the number of classes of Inda^ intersecting Z U T; 
clearly Ij < pj + qj for each j. Now consider a sub-sequence Uj for which 

logU^^i It converges. In case it falls into (II), we have In. < 2g„ for al- 
^^9 rii=i 

most every j and thus < dim AT) < 

logTldiki ■’ logYliiiki 

dimj^{Z). 
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Similarly in case the sequence falls into (III), In, < for almost every 
j and thus lirrij^oo < dimUZ). 

logWii^ki 



In case the sequence is in (I), by its convergence we have lim 



^oglliii h 
log Iliii ki 



< 



limu^oo ^°og » lirriy^oo fct < max{dimA{Z) , dimA{T)} = ditriA 

(Z) where u, v run respectively over indices rij where < qn^ , Pnj > Qnj ■ 
Finally, (iii) follows from the fact that Q n Cln^^Z 0 if and only if 
Q n Z 0 for every Q, a class of Inda„, any n. 



31.5 Conclusions 

We have examined the notion of a fractal in the universe of an information 
system, and we have defined the ^-dimension proving its basic properties. 
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This paper aims at discussing two generalizations of fuzzy multisets in order 
to take infinite features into account. First, a class of fuzzy multisets having an 
infinite membership set for an element of the universe and finite cardinality is 
introduced. The sum, union, intersection as well as most t-norm and conorm 
operations except the drastic sum keep the property of the finite cardinality 
of the derived set. Second, the membership sequence is generalized to a closed 
set on the plane whereby both the fuzzy multiset and another fuzzification 
of multisets using the fuzzy number are discussed within this framework. 



32.1 Introduction 

Multisets, sometimes called bags, have been considered by many authors 
(e.g., [32.3, 32.1]) and used in a number of applications. Fuzzy multisets have 
also been considered by several researchers [32.6, 32.4]. An application of 
fuzzy multisets is information retrieval on Web, since an information item 
may appear more than once with possibly different degrees of relevance to a 
query [32.5]. 

This application invokes interesting problems. Huge amount, almost infi- 
nite, of information items exists in the space of WWW. A query may search 
a very large number of the items wherefrom all information is unable to be 
obtained by human capability. We thus observe a small part of the obtained 
information pieces. Such experiences lead us to consideration of infinite fuzzy 
multisets. The infiniteness implies that although the information pieces may 
be finite but the number of information items is very large and there is no 
fixed upper bound to this number. 

We are concerned with infinite fuzzy multisets in this paper. The infi- 
niteness does not mean the universal space on which fuzzy multisets are 
discussed is infinite. It means that a membership set for an element the uni- 
verse is infinite even when the underlying crisp multisets cannot have infinite 
multiplicity. We introduce a class of infinite fuzzy multisets for which the 
cardinality is finite, and shows that most t-norm and conorm operations for 
two sets in this class keep the derived set within this class. 
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Second generalization of fuzzy multisets is moreover considered. There is 
another fuzzification of a multiset using the fuzzy number. This generalization 
essentially include the both fuzzifications using the membership sets and the 
fuzzy number in a unified framework. 



32.2 Multisets and Fuzzy Multisets 

A multiset M of A is characterized by the count function Cm '■ X 
{0, 1,2,...}. Thus, Cm{x) is the number of copies of the element x G X. 

The followings are basic relations and operations for crisp multisets; 

(inclusion): M C N Cm{x) < Cn{x), \/x G X. 

(equality): M = N Cm(x) = Cpf{x), \/x € X. 

(union): Cmun{x) = Cm{x) V Cn{x). 

(intersection): CMnN{x) = Cm{x) A Cn{x). 

(sum): Cm+n{.x) = Cm{x) + Cn{x). 

It is reasonable to assume that the number Cm{') should be finite. Moreo- 
ver we assume X is finite: X = {x \, . . . , Xn}- 

A fuzzification of the multiset is to define Cm (x) in terms of fuzzy num- 
bers. We thus use the above definitions but Cm{x) and Cm{x) are assumed 
to be nonnegative fuzzy numbers. 

A fuzzy multiset A of A (more often called fuzzy bag) is characterized by 
the function Ca{-) of the same symbol, but the value Ca{x) is a finite set in 
I [32.6]. Given x € X, Ca{x) = {/r, . . . , , fx" G I. 

For two fuzzy multisets A and B of X such that Ca{x) = {/i, fi' , ■ ■ ■ , 
Cb{x) = {v,v' , . . . the sum A + B is Ca+b{x) = {fi, fi' , . . . , n” , 

v,v' , , v'"}, but other operations need another representation called mem- 
bership sequence [32.4]. 

A membership sequence is defined for each Ca{x) = {/i, /i', . . . , /x"|; 
the set {/r, • 5 is arranged into the decreasing order denoted by 

A\{x),y.\{x),...,g.J{x): {^i\{x) , ^x\{x) , . . . , ^A2{x)} = {^i, /x', . . . , /i"| 

{yi\{x) > t^Ai^) ^ ^ t^Ai^))- The followings are other basic relations and 

operations for fuzzy multisets [32.4]; they are given in terms of the members- 
hip sequences. 

1. inclusion: A C B t^A(^) — Ab(^)^ j = ■ ■ ■ jIxi, Va; G A. 

2. equality: A = B J = Vx G A. 

3. union: ^i\^JB{x) = V J = Va; G A. 

4. intersection: fJ-Ansi^) = t^A(^) ^ A b(^) ^ j = !> • ■ • i™; Va; G A. 

5. t-norm and conorm: 

l^ATsi^) = i = 1, ■ • ■ Va; G A. 

t^ASBi^) = J = VxGA. 
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Remark that there are different types of t-norms and conorms: we consider 
the algebraic product and sum, the bounded product and sum, the Frank 
family, the Hamacher family, the Yager family, the Sugeno family, and lastly 
the drastic product and sum [32.2]. All t-norms and conorms are denoted by 
single letters T and S except the last one; the drastic product and sum are 
denoted by T^i and Sd, respectively. 



32.3 Infinite Memberships 

Even when crisp multisets cannot admit infinite values of the function Cm{x), 
fuzzy multisets are capable of having infinite number of memberships. Re- 
mark that every infinite set does not provide a well-defined fuzzy multiset, 
since an a-cut of a fuzzy multiset should give a crisp multiset of the finite 
count. 

Instead of the finite set, infinite Ca{x) = {fi, fj,', ...} is used. We assume 
that the members {/r,/i', . . . } of Ca(x) can be arranged into the decreasing 
order: 

Ca{x) = {fi\{x),iJ.\{x ), . . . }, fi\{x) > > . . . 

In order that the a-cuts provide well-defined crisp multisets, it is necessary 
and sufficient that ^ as j ^ oo, for all x G X. This class of fuzzy 

multiset of X is denoted by XAio{X). 

The operations such as A + B, AU B, etc. are defined in the same way as 
above except that m ^ oo in the definitions. We have 

Proposition 1. For A,Bg TM.q{X), A+B G TMq{X), A\JB G tFA4o{X), 
An R G tFXio(X), ATB G tFA4o(X), ASB G TMo\x), except the drastic 
sum: ASdB G tFMo{X) does not necessarily hold. 

A basic measure of a fuzzy set F is its cardinality defined by |F| = 
f^pix). When a fuzzy multiset A of finite membership sets is conside- 
red, its generalization is immediate: 

m 

1^1 = 

Let us consider the cardinality for the infinite memberships. We define 

oo 

\A\, = Y,^^\{x). ( 32 . 1 ) 

Then, |A| = \A\x- It is easy to see that |A| is finite if and only if \A\x 

is finite for all a; G A, since we are considering finite X . 
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Note that for some sets, say B G TMo{X), \B\x = +oo. (Consider 
/i^(a;) = 1/j.) We hence introduce a subclass TMi{X) for which the cardi- 
nality is finite: 

TMi{X) = {A& BMo(X) : < oo, Vx G X}. (32.2) 

We now have the following proposition. 

Proposition 2. For arbitrary A, B G BA4i{X), A + B G TXi\{X), A\JB G 
TMi{X), AnB G TMi{X), ATB G TMi{X), ASB G BMi{X), except 
the drastic sum: ASjjB is not necessarily in TXi\{X). 

It should be noted that most, but not all, t-conorms keep the derived sets 
within TM-i- 

32.4 A Set-Valued Multiset 

It seems that nothing is in common between fuzzy multisets and fuzzification 
by fuzzy numbers. On the contrary, there is a generalized framework in which 
the two kinds of fuzzified multisets are put. 

Let us notice that the membership sequence, whether it is finite or infinite, 
is regarded as a nonincreasing step function. In view of this, we first consider 
a monotone nonincreasing function C,a{u', x) of the variable y G [0, -l-oo) with 
the values in [0, -l-oo) for every x G X as, a, parameter. Moreover the function 
is assumed to satisfy 0(2/; x) ^ 0 as y ^ oo. Even if we do not assume any 
kind of continuity, it is well-known that the function Ca{u',x) is continuous 
almost everywhere due to the monotone property. We moreover assume, for 
the next step, that the function is upper-semicontinous. 

Second, this function 0(?/; x) is transformed to a closed set VAiy, z; x) on 
the (y, z)-plane; we use the set i^^(-,-;x) as the membership for the genera- 
lized fuzzy multiset. This set is defined by 

VA{y,z]x) = {{y,z) G [0,oo)^ : CA{y,x) > z}. 

Another function r]A{z;x) with the variable z derived from iza is moreover 
defined: 

r]A{z;x) = supjy G izA{y, z; x)}, {z G (0,oo)). 

It is evident that if we define 

’Za(z, y; x) = {(y, z) G [0, oo) X (0, oo) : 7]a{z; x) > y} 

U{(y,0) :yG [0,oo)|, (32.3) 

then VA{y,Z]x) = v'^{z,y,x). 

The generalized fuzzy multiset A is characterized by VA{y, z; x). 

For two generalized fuzzy multisets A and B of X, the basic relations and 
operations are defined by the operations on the sets va and vb- 
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(I) (inclusion). A C B •; a:) C x), G X. 

(II) (equality). A = B va{-, •; x) = vb{-,'] x), Wx £ X. 

(III) (sum). Define t]a+b{z;x) = t]a{z;x) +r]B{z;x) and derive v'a+b fro™ 
t]a+b{z;x) using (32.3). 

Define i^a+b by i^A+siv, z;x) = y; x). 

(IV) (union). Define CAuB{y;x) = CA(y;x) V C,B{y,x) and derive vavjb from 
CAus(y; x). 

(V) (intersection). Define CAnB{y, x) = CA^y, x) A Cs(y; x) and derive vahb 
from CAnBiy; x). 

(VI) (t-norm and conorm). Define Catb(j/; a;) = t{CA(y; x),C,B{y\ x))\ 
C,ASB{y,x) = s{C,A{y,x),C,B{y,x)) and derive vatb and i>asb from 
C,ATB{y,x) and C,ASB{y,x), respectively. 

It is evident that this generalization includes the fuzzy multisets and 
positive real- valued multisets [32.1], whereas it is not obvious that this also 
includes the fuzzification by the fuzzy number. 

A simple mapping from the class of Ca{x) as a fuzzy number to Ca(s x) is 
used for showing this generalization encompasses the fuzzification of multisets 
using fuzzy numbers. Notice that Ca(x), a fuzzy number, consists of two 
upper-semicontinuous functions L{y) and R{y)\ Ca{x) = L{y), {0 < y < c) 
and Ca{x) = R{y), (c < y), where L{c) = R{c) = 1. 

First, L{y) is transformed into a lower-semicontinuous function L(y) 
which is equal to L{y) on all continuity points. Then Ca{x) is mapped to 
C.4(-; a^) by the next rule: 

, , ( 1 - ^Hy), 0<y<c, 

CA[y;x) = ^ ^ z 

[ 2 ^{y), c<y. 

It is immediate to see that the inclusion and equality as well as the ope- 
rations of the sum, union, and intersection for the fuzzification by the fuzzy 
number is expressed in terms of the present generalization by the above map- 
ping. 



32.5 Conclusion 

We have discussed two generalizations which include infinite features in fuzzy 
multisets. In the first generalization a subclass of finite cardinality has been 
introduced and it has been shown that the standard set operations are per- 
formed within this class, whereas an exceptional t-conorm of the drastic sum 
may put the derived set out of this class. More general results will be expected 
about t-conorms. 

In the second generalization two fuzzifications of the crisp multiset are 
considered in the unified framework. When compared with the first genera- 
lization, the latter is more general. 
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Multisets have close relationships with rough sets and their generaliza- 
tions [32.7]. Theoretical aspects of fuzzy multisets in relation to rough sets 
should further be considered. 

We have suggested application of infinite fuzzy multisets to information 
retrieval on WWW. More efforts should be concentrated on such applications 
as future studies. 
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This paper aims at proposing and comparing two fuzzy models and a sta- 
tistical model for clustering based on Li-space. Clustering methods in the 
fuzzy models are the standard fuzzy c-means and an entropy regularization 
method based on Li-space. Furthermore, we add new variables to them for 
improving the cluster division. In the statistical model, a mixture distribution 
model based on Li-space is proposed and the EM algorithm is applied. 



33.1 Introduction 

A characteristic of methods of data clustering is that various measures of 
distance and similarity between objects can be employed [33.1, 33.5]. For ex- 
ample, the L\ space, instead of the most known Euclidean space, is sometimes 
useful in crisp and fuzzy c-means. 

Several results have been published in fuzzy c-means based on the L\- 
space [33.3, 33.7, 33.9], and studies are ongoing in order to improve the 
method and to investigate the properties of the clusters theoretically. For 
example, the method of entropy regularization and fuzzy classification func- 
tions [33.9, 33.8] should be studied; additional variables for clustering can be 
taken into account [33.6]. 

The aim of the present paper is to include new variables into the methods 
of the standard fuzzy c-means [33.2] and the entropy fuzzy c-means [33.10] 
based on the Li-metric. In addition, a new mixture distribution model on 
the Li-space is proposed in which the EM algorithm [33.4, 33.11] is used to 
estimate parameters. 



33.2 Fuzzy c-Means Based on Xi-Space 

Assume that the p-dimensional space BF is equipped with the weighted Li~ 
norm: for x = (x^, . . . ,x^) and y = (j/^, . . . ,yP) in BF , 

p 

11 ^ “ 2^11 = - y^\, 

i=i 
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where , w^) is the weight vector. A set X = {xi, . . . , x„} of objects 

Xk = (xl, . . . , x\) G EF should be divided into c clusters. Clusters are denoted 
by Gi {i = 1, . . . , c) or simply i. Center for cluster i is denoted by Vi = 
{vj, . . . , v^); we write V = (fi, . . . , Vc) for simplicity. The membership matrix 
is U = (uik)', Uik is the degree of membership of Xk to cluster i. 

The method of fuzzy c-means uses an alternative minimization of an ob- 
jective function J{U,V). In addition to U and V, we use more variables 
a = (oi, . . . , «c) for controlling the sizes of clusters and rj = (ijf) for control- 
ling the scatters of them. 

We consider the following two objective functions. 

c ^ / \ 'm P 

Jstd{U, V,a,rj) = ^ai^( — ] ^r]f \xi - vf \ 

i=i k=i ^ ^ j=i 

c n p c n 

Jent{U,V,a,T]) = -'vi\ + A"^ ^^'U.fclog — 

i=l k—1 j — 1 i—1 k—1 

The subscript std and ent imply that the methods are standard fuzzy c- 
means [33.2] and the method of entropy regularization [33.10, 33.8], respec- 
tively. Each function has its parameter: m(> 1) in Jgtd and A(> 0) in Jent- 
The constraints for U, a, and rj are 

= I (uik) I Uik € [0, 1], J2i=i Uife = 1, A: = 1, . . . , n |, 

A = I (a,) I a^ G [0, 1], J2i=i = 1 }, 

^ = { ivi)\vi > 0, = 1 , i = 

The next alternative optimization algorithm FCM is used for clustering 
in which J = Jent or J = Jstd- 

Algorithm FCM. 

FCMl. Set initial values V and a Cfj. 

FCM2. Solve min JiU.V ,a,f]) and let the solution be U. 
ueM 

FCM3. Solve niin J({7, F, d, rj) and let the solution be V. 

FCM4. Solve min J(C7, F, a, rj) and let the solution be a. 

a&A 

FCM5. Solve min J{U, V, a, rj) and let the solution be fj. 

TjeH 

FCM6. If the solution {U, V, a, rj) is convergent, stop; otherwise go to FCM2. 

The optimal solutions of U, a, and rj for J = Jent and J = Jgtd are as follows. 
For the cluster centers F, we do not have a closed formula. Instead, an efficient 
algorithm can be employed. For simplicity we put Dik = ’dlWk ~ '^l\- 




33. Fuzzy c-Means and Mixture Distribution Model 291 



(i) J — Jstd ' 



'^ik 




OLZ 

ai 




= Dlk \ ™ 



nl 



n 



ELiKfcri4-^^i 



\ J2k=iiuik)'^\xi - vf\ 

(ii) J = J ent • 



Uik ' 



E 



aie 






^=-E 



Wife , Vi 



n 



ELi uik\xi-vj \ \ 1 



Calculation of C(cf. [33.9]). 

First, xf,X 2 , . • . , x^_j^,x^ are sorted into the increasing order. 



X 



2^1) 2^2, • ■ . , Xj^_j^, x^ 



i SORT 



1(1) ^ ^9(2) < • •• < ^E-1) 



< X 



J 

q(n) 



Algorithm C: 

begin 

n 

S:=-^Y.(^ikn 

fc=i 

r := 0; 

while (S' < 0) do begin 
r := r + 1; 
S:=S+(ii,,(,))™; 

end; 

output vl = 

end. 



This algorithm is for V in Jstd', For Jent, (uik)"^ should be replaced by 
Uik- Notice that this algorithm is very fast, since the computation of 0{np) 
is sufficient in the main loop of iteration except the initial sorting. 



33.3 Mixture Distribution Based on Xi-Space 

Mixture distribution model can be used for clustering [33.5, 33.11]. Our pur- 
pose is to develop a mixture distribution model for Ti-space, in contrast to 
the Gaussian mixture model for the Euclidean space. Three elements are used 
in clustering by a mixture distribution. 
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(i) the prior probability of occurrence of the cluster Gf.P^Gi) = ai, 

r 

(ii) the conditional probability of x given Gf. P{x\Gi) = / pi{x\4>i), 

J — OO 

(iii) the probability P{Gi\x) by which an observation x is allotted to Gi. 
Notice the Bayes formula: 

P(G.|x) = ,33 

i=i j=i 

We must assume the density Pi{x\(pi) and estimate the parameters at and 
4>i (i = 1, . . . ,c). Since the Gaussian distribution cannot be used in Li-space, 
we assume the following density function: 



p.(xl<Pi)=p.(xlp.,i^.) = n 
i=i 

where the parameter (pi = (p], . . . , pp’ , 1 /} , . . . ,i/f) is 2p-dimensional vector. 

In order to estimate the vector parameter <P = (ai, . . . , pi, , p^), 
the EM algorithm [33.4, 33.11] is used. Let 

c c n 

g(^)|^>W) = logo3 + EE '4’fk ^Ogp,{Xk\pi). 

i—1 i—1 k—1 



The EM algorithm. 

(O) Set initial value of for the parameter (p. Put £ = 0. 

Repeat (E) and (M) until convergence. 

(E) Calculate g(^|^>W). 

(M) Solve max g(^|^^^^)and let the optimal solution be 

<5 

Put £ = ^ + 1 and = <P. 

End EM. 

The solution in the step (M) is given as follows. Put 



Pik — c 



(i) 

Kk = 



Vik 









1 ^ 

Optimal ai', a* = - ^ (z = 1, . . . , c). 

^ k^l 

Calculation of 
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This algorithm is essentially the same as the former algorithm for calculating 
the cluster centers. First the sorting (33.2) is performed. The algorithm C in 
the previous section is then applied with the obvious replacement of (uik)^ 
and {uiq(r))"' into wff} and respectively. 

Lastly, vj is obtained in terms of the optimal : 



n 



33.4 Conclusion 

Li-based methods of the standard and entropy fuzzy c-means with additio- 
nal variables of the sizes and scatters of the clusters as well as the mixture 
distribution model have been proposed and algorithms have been developed. 
In the mixture distribution model, it has been shown that the EM algorithm 
is employed. 

Future studies include application to real data, in particular data mining 
applications are promising, since binary and nominal data should be dealt 
with, which means that Li-space is a suitable framework. 
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We consider two generalized situations: a case when an equivalence relation is 
generalized to a similarity relation and a case when a partition is generalized 
to a cover. Two interpretations of rough sets, i.e., the approximation by 
means of elementary sets and the distinction among positive, negative and 
boundary regions, are conceivable. The relations between two generalized 
situations are investigated. Rough sets are generalized based on two different 
interpretations under two different situations. Fundamental properties and 
complete definability are discussed in each generalization. 



34.1 Introduction 

Rough sets were originally proposed in the presence of an equivalence relation. 
An equivalence relation is sometimes difficult to be obtained in realworld 
problems due to the vagueness and incompleteness of human knowledge. From 
this point of view, the concept of rough sets has been extended to cases when a 
similarity relation and a fuzzy partition are given (see [34.1]-[34.4]). However 
we have different definitions of rough sets even under the same generalized 
equivalence relation. Those different definitions coincide when the generalized 
equivalence relation degenerate to an equivalence relation. In spite of this 
difference, the reason has not discussed considerably, so far. 

In this paper, we demonstrate that there are two interpretations of rough 
sets and two generalized problem settings. In crisp cases, one of the two gene- 
ralized settings is a situation that a similarity relation instead of an equiva- 
lence relation is given and the other is a situation that a cover instead of the 
partition associated with an equivalence relation is given. Rough sets compo- 
sed of lower and upper approximations are interpreted in two different ways: 
distinction among positive, negative and boundary elements of a given subset 
and approximation of a given subset by means of elementary sets obtained 
from a similarity relation or a cover. Restricting ourselves into crisp cases, we 
discuss the relations between two different settings, how definitions of rough 
sets are different depending on the interpretation, fundamental properties of 
rough sets under those interpretations and the complete definability of rough 
sets. 
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Table 34.1. Fundamental properties of rough sets 

^(i) R,{X) C X C R*{X) 

(ii) R. (0) = i?* (0) = 0, i?* (f/) = R*{U) = U 

(iii) R,{xr\Y) = R4X)r\R^{Y), R*{X\JY) ^ R*{X)\J R*{Y) 

(iv) X C y implies i?.(X) Q R*{Y), X QY implies R*{X) C R*{Y) 

(v) R,{x u y) D i?*(x) u R*{Y), R*{x n y) c r*{x) n r*{y) 

(vi) R,{u -X) = U - R*{X), R*{U -X) = U - R4X) 

(vii) R,{R,{X)) ^ R*{R,{X)) = R*{X), R*{R*{X)) = R,{R*{Xj) = R*{X) 



34.2 The Original Rough Sets 

Let R be an equivalence relation in the finite universe U. In rough set litera- 
ture, R is referred to as an indiscernibility relation and a pair (U, R) is called 
an approximation space. By the equivalence relation R, U can be partitioned 
into a collection of elementary sets, 17|i? = {Ei,E 2 , ■ ■ ■ , En}- Define R{x) as 
R{x) = {y G U \ yRx}. Then we have x G Ei if and only if Ei = R{x). 

In rough sets, we consider the approximations of an arbitrary set X C U 
by means of elementary sets. Then the rough set of X is defined by a pair of 
the following lower and upper approximations: 

R^{X) = {x€U \ R{x) C X}, R*{X) = {x&U\ R{x) 

(34.1) 

By the definition, R^{X) C R*{X) holds. If R*{X) = R*{X) holds then X 
is said to be completely definable under the approximation space {X, U). 

Under indiscernible circumstances given by (U,A), we cannot recognize 
the difference among elements in Ei but between x £ Ei and y G Ej (f yf j). 
Thus, what we can specify is not a particular element x of U but a particular 
elementary set Ei of U\R. Consider an element to which we know only it 
is in Ei. If Ei C R^,{X), we can conclude that the element belongs to X. 
fi Ei C U — R*{X), we can conclude that the element does not belong to 
X. From those facts, and U — R*{X) are regarded as the positive 

and negative regions of X, respectively. R*{X) — is regarded as the 

ambiguous region. 

The fundamental properties of and R*{X) are listed in Table 34.1. 

Let 

( M R{x), if 3R{x); R{x) C X, 

RI(X) = < R{x)cx (34.2) 

[ 0, otherwise, 

( n (C/-i?(x)), if 3i?(x);i?(x)n(C/-X) y^0, 

RI(X) = < R{x)n(u-x)iiii) (34.3) 

I u, 



otherwise. 
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r O {U — R{x)), if 3R{x); R{x) D X = (d, 

R*i{X) = I fl(.)nx=0 (34.4) 

[ U, otherwise, 

f U if 3 R{x); R{x) n X (b, 

RliX) = < (34.5) 

[ 0, otherwise. 

Since R is an equivalent relation, we have i?*(X) = i?):(X) = R^{X) and 
R*{X) = RliX) = R*{X). 



34.3 Two Different Problem Settings 

An equivalence relation R is identified by a partition U\R = {Ei, E 2 , ■ ■ ■ , A„} 
and vice versa. From this fact, there are two possible generalization schemes: 
generalization of R and generalization of U\R (see [34.2]). 

Generalization of R is to drop and/or to weaken some of the requirements 
of R so that R can be considered the so-called similarity relation, i.e., xRy 
means ‘x is similar to y\ Until now R is generalized up to a relation which 
satisfies only the reflexivity (see [34.4]). On the other hand, generalization 
of U\R is to give a cover of U, i.e., a class T = {^ 1 ,^ 2 , . . . , U„} such that 
C^ = U”=iJ".(see [34.1]). 

Let us discuss relations between those generalizations. First consider a 
case that a similarity relation R is given. When R is no longer symmetric, a 
set of elements similar to a;, i.e., R{x) is distinct from R~^{x) = {y \ xRy} 
that is a set of elements to which x is similar [34.4]. If R is reflexive, we can 
have a cover T = {R{x) \ x G U}. Since a similarity relation R should satisfy 
the reflexivity, the situation with i? is a special case of the situation with a 
cover T . 

On the other hand, when a cover T = {Ui, E^, . • . , Fn\ is given, we face 
a problem how we can produce a similarity relation R such that T = {i?(x) | 
X G U}. Only if there is a unique Ei such that x G Fi for any x G U, we 
can solve this problem. However, this case is nothing but a case when E is 
a partition. Thus, there is no R satisfies F = {R{x) \ x G U} whenever E is 
not a partition. 

Hence, under a finite universe U, a problem setting with a cover E seems 
to be more general than that with a similarity relation R. This is true in 
an interpretation of rough sets as approximations by means of elementary 
sets. However, each elementary set Fi of E is not associated with an element 
X G U. Because of this fact, we cannot always say that a cover E is more 
general than a similarity relation R. 

Finally, we should note that R\ and are no longer equivalent in 
both generalized settings. Neither R* , R\ nor R^ are. We have R*{X) = 
U — R^,{U — A), R*{X) = U — R\{U — A), z = 1, 2 and, under the reflexivity 
of R, we obtain Rl{X) C i?*(A) C R\{X) and i?J(A) C R*{X) C i?^(A). 
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Table 34.2. Fundamental properties of and 

Hi) :Ft{X) C X C Tt{x) 

(ii) Tl (0) = ^1* (0) = 0, Tl {U)=Tt{U) = U 

(iii) Tl{XnY)(Z Tl {X) n Tl (Y) , (X U y ) D Tl (X) U T{ (y ) 

(iv) X C y implies T\(X) C Tl{Y), X C y implies .7^1* (X) C .T^i (y) 

(v) TI{X\JY)^ Tl (X) u Tl (y ) , T{ (X n y ) C ti (x) n t{ (y ) 

(vi) Tl{u - X) = U - Tt{X), Tl{u - X) - Tl{X) 

(vii) Tl{Tl{X)) = Tl(X), Tl{X) C Tt{Tl{X)) C Tl{X), 

T{{T{{X)) = T{{X), T{{X) D Tl{Tl{X)) D Tl{X) 



34.4 Approximation by Means of Elementary Sets 

In interpretation of rough sets as approximations of sets by means of ele- 
mentary sets, we assume a general setting, i.e., a case when a cover T = 
{Fi,F 2 ,... ,-Fn} is given. In this case, we should consider Tl{X), Tl{X), 
Tl{X) and T^iX) defined by (34.2)-(34.5) substituting Fi for R{x), respec- 
tively. 

We can prove T‘i{X) C Tl{X) C X and X C Tl{X) C X|(X). Hence, 
T}{X) and T'l{X) are better lower and upper approximations of X. Thus, 
we define a rough set of X under X by a pair of Tl{X) and Tl{X). 

For TI{X) and T'l{X), we have fundamental properties listed in Ta- 
ble 34.2. By the lack of disjointedness between Fj and Fj (i yf j), none of 

Xl(x n y) D Tlix) n xi'(y), xr(x u y) c xr(x) u xr(y), xi(x) d 

Tl{Tl{X)) and T^{X) D Tl{Tl{X)) always holds. 

Complete definability of X in the setting where T is given can be defined 
as 

(a) X is F-inner completely definable if and only if F^(X) = X is 
satisfied. 

(b) X is F-outer completely definable if and only if F^(X) = X is 
satisfied. 

(c) X is F-completely definable if and only if X is F-inner completely 
definable and at the same time F-outer completely definable. 



34.5 Distinction among Three Regions 

Let X be a set corresponding to a vague concept. Then the elements of X 
are not always agreed by all people. A given set X includes elements on 
whose memberships all people agree and also elements on whose members- 
hips some people argue. Elements of X can be divided into unquestionable 
and questionable members. In such a case, rough sets can be applied to clas- 
sify elements into three categories: positive members, negative members and 
boundary members. 

Let X and X be sets of positive members and possible members, res- 
pectively. Here ‘possible members’ are composed of positive and boundary 
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Table 34.3. Fundamental properties of RI{X) and R^IX) 

^(i) RUX)CXCRUX) 

(ii) Rl{$) = R* 3 {$)=$,Rl{U) = RUU) = U 

(iii) RliXnY) = Rl{X)nRt{Y), RUXUY) = RUX)URUy) 

(iv) X CY implies Rl{X) Q Rl{Y), X <ZY implies R^X) C Rl{Y) 

(v) R3(XUY) 5 Rl(X)VJ Rl(Y), R%{XnY) C R%{X)nR* 3 {Y) 

(vi) Rl{U -X) = U - RUX), RUU -X) = U - Rl(X) 

(vii) R'^{RI{X)) C X does not always hold, RI{R^{X)) 3 X does not always hold. 



members. A given X should satisfy A C A C X. We assume that only 
elements which are similar to a member of A can be regarded as possible 
members. Then we have 

A C IJ R{y) = {x I R-\x) n A 0}. (34.6) 

vex 

Since U — 2L= {U — A) and {U — A) = U — X, we also have 

XDU-[jR{y) = {x\R-\x)CX}. (34.7) 

vix 

In our problem setting, we know A such that A C A C A, only. We obtain 
a lower approximation of A and a upper approximation of A as follows: 

Rl{X) = {x\R-^{x)CX}, Rl{X) = {x\R-^{x)f]X^{b}. (34.8) 

The fundamental properties of i?^(A) and R‘^{X) are listed in Table 34.3. 
By the interpretation of lower and upper approximations, R^{R^{X)) and 
R^{R%{X)) are nonsense. Comparing to Table 34.2, property (iii) is preserved 
in Table 34.3. The preservation of property (vii) in Table 34.3 is worse than 
that in Table 34.2. 

From (34.6) and (34.7), a family of consistent lower regions is given as 
A = {A|i?^(A)CACAC i?g(A)}. Similarly, a family of consistent 
upper regions is given as A = {A | i?*(A) C A C A C i?|(A)}. From 
(34.6) and (34.7) again, A C Rl(X_) and i?*(A) C A should be satisfied. 
Thus, a family of consistent pairs of positive and possible regions is obtained 
as C = {(A, A) I A G A, A G A, A C i?|(A), Rl(X) C A}. Note that 
(A, A) G C always holds. 

We can define the definiteness of A under a similarity relation R as fol- 
lows: 

(d) A is said to be i?-definite if and only if C is a singleton. 

(e) A is said to be i?-inner definite if and only if A = A for all (A, A) G 

c^d). _ _ 

(f) A is said to be i?-outer definite if and only if A = A for all (A, A) G 

When A is i?-definite, we have C = {(A, A)}. This implies that the 
concept expressed by A is precise. A is i?-definite whenever A is i?-inner 
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and outer definite. X is i?-inner definite if Rl{X) = X and X is i?-outer 
definite if R'^{X) = X. When R is an equivalence relation, X is i?-definite 
if and only if Rl{X) = R^{X) = X. Thus, the definiteness corresponds to 
complete definability. 
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35. Two Procedures for Dependencies 
among Attributes in a Table with 
Non-deterministic Information: 

A Summary 

Hiroshi Sakai 

Department of Computer Engineering, Kyushu Institute of Technology, 
Tobata, Kitakyushu 804, Japan sakai@comp.kyutech.ac.jp 

The data dependency among attributes is very important for the rule gene- 
ration. So far, we proposed a dependency among attributes in a table with 
non-deterministic information, and developed some important algorithms. 
According to these algorithms, a procedure for dependencies has been imple- 
mented. This paper proposes new algorithms and enhances the implemented 
procedure. In two procedures, the manipulation of equivalence relations takes 
an important role. 



35.1 Preliminary 

The rough set theory has been widely applied in the research areas in artificial 
intelligence such as knowledge, imprecision, vagueness, learning, induction, 
and so on [35.2], since it was proposed by Pawlak in around 1980. 

According to [35.2], we define a Deterministic Information System DIS = 
{OB, AT, {V ALa\a € AT},f), where OB is a finite set whose elements we 
call objects, AT is a finite set whose elements we call attributes, V AL a is a 
finite set whose elements we call attribute values and / is a mapping such 
that / : OB x AT UaV ALa which we call a classification function. For 

every object x, y{x ^ y) € OB, if f{x, a) = f{y, a) for every a € AT, we 
say there is a relation for x and y, which becomes an equivalence relation 
over OB. We express an equivalence class with an object x as [x]. If a set 
A(c OB) is the union of some equivalence classes, we say X is definable. 
Otherwise we say X is rough. 

Suppose CON(cAT) and DEC(cAT) denote condition attributes and 
decision attributes, respectively. We say that two objects x, y{x y) € OB 
are consistent for CON and DEC, if f{x,a) = f{y,a) for every a G CON 
then f{x, a) = f{y, a) for every a G DEC. In case every object is consistent 
with other objects in a DIS, we say the DIS is consistent for CON and DEC, 
and we see there exists a dependency between CON and DEC. Furthermore, 
we see every tuple restricted to CON and DEC is a rule. In case a DIS is 
not consistent for CON and DEC, a ratio \POScon{DEC)\/\OB\ is ap- 
plied to measure the degree of dependency. Here, the set POScon(DEC) = 
Ll{L Ge< 7 (COA^) [there exists such M Geq{DEC) as L cM} is called the 
positive region. 
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35.2 Definitions of NISs 

We show a framework of the Non- deterministic Information System NIS ac- 
cording to [35.1]. We define a NIS = {OB , AT, {V ALa\a G AT},g), where g 
is a mapping such that g : OBxAT P{UaV ALa){A power set of Uot^^lLa)- 

Every set g{x, a) is interpreted as that there is an actual value in this set but 
we do not know it. This is called the unknown interpretation for the incom- 
plete information. Especially if we do not know the attribute value at all, we 
consider g{x,a) = V AL a- This is called the null value interpretation. 

As for NISs, Lipski showed the modal question-answering. Orlowska and 
Pawlak discussed the modal concept, especially the axiomatization of the 
logic in NISs. Grzymala-Busse surveyed the unknown attribute values and 
studied the learning from examples with unknown attribute values. 
Example 1. Let’s consider the next NIS and the problem. 



Table 35.1. A Table of a NIS 



OB 


A 


B 


C 


1 


3 


1 


1 


2 


5 


{2,4} 


2 


3 


{1,4,5} 


5 


4 


4 


4 


5 


2 


5 


3 


5 


2 


6 


4 


5 


{1,3} 


7 


5 


4 


1 


8 


1 


{1,3,4} 


1 



Problem: In Table 1, do we see there exists a dependency between {A,B} 
and {C} ? Generally, how do we deal with the dependency in every NIS, 
and how effectively do we calculate the dependency in NIS ? 

For this problem, we consider every possible case in the NIS. In Table 1, 
36(=3*2*3*2) possible DISs are derived by replacing g{x, a) with an element 
in g{x, a). Generally in every NIS, we call such a DIS a derived DIS from a 
NIS. According to derived DISs, we propose a new dependency in a NIS. 
A Proposal of New Dependency in a NIS 

Suppose there exist a NIS, all derived DISi, ■ ■ ■ , DISm, condition attributes 
CON, decision attributes DEC. For two threshold values vali and ua?2(0 < 
vail, val2 < 1), if the following conditions hold then we see there exists a 
dependency between CON and DEC in the NIS. 

(1) Suppose a set P = {DISi\DISi{l < z < m) is consistent for CON and 
DEC}. For this set P, |P|/m > vah. 

(2) mzrzi{degree of dependency in DISi\{l <i< m)} > val 2 . 

This new dependency is calculated by each degree of dependency in every 
derived DIS. In Example 1, suppose vali = 0.8 and val 2 = 0.8. The con- 
dition (1) requires |P|/36 >0.8, namely more than 29 derived DIS must be 
consistent. The condition (2) requires the minimal degree of dependency is 
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more than 0.8. As for the implementation, the simple way is to calculate the 
degree of dependency for all derived DISs. However, this way is not suitable 
for NISs with large number of derived DISs. We rely on another way for 
the implementation. 



35.3 A Way to Obtain All Possible Equivalence 
Relations 

We call every equivalence relation in a derived DIS a possible equivalence 
relation {pe-r elation) , and call every element in a pe-relation a possible equi- 
valence class {pe-class). 

Proposition 1. Suppose there exist a NIS and a set ^(C OB). If there 
exist subsets of OB, CLi, ■ ■ ■ ,CLm satisfying (1) and (2), X is definable in 
this NIS. 

(1) = X. 

(2) {CLi,- ■ ■ , CLjn} is a subset of a pe-relation. 

According to this proposition, we check the definability of a set by finding 
sets CLi,- ■ ■ , CLjn. We have already realized this program. In order to obtain 
all pe-relations, we put X = OB. Then, all pe-relations are obtained as a side 
effect of checking the definability of the set OB [35.3]. 



35.4 Procedure 1 for Dependencies 

Let’s eq{CON) and eq{DEC) be equivalence relations for the condition and 
decision attributes in a DIS, respectively. In this case, it is easy to cal- 
culate the degree of dependency \POScon{DEC)\/\OB\ by eq{CON) and 
eq{DEC) [35.3]. This property is applied to all pe-relations in every NIS, 
and the new dependency is calculated. The following is a procedure for it. 
Procedure 1 

(Step 1) Prepare a data file and an attribute file. The attributes CON and 
DEC are defined in this attribute file. 

(Step 2) Translate them into internal expressions. 

(Step 3) Pick up all pe-relations for CON and DEC , respectively. 

(Step 4) Calculate criteria values by those relations. 

The following is the real execution of Step 4 in Example 1. Here, CON = 
{A, B} and DEC = {C}. 

"/, dependency 

Dependency Check [1,2] => [3] 

CRITERION 1 Degree of consistent DISs: 0.0 
CRITERION 2 Minimal Degree of Dependency: 0.375 
Maximal Degree of Dependency: 0.750 
EXECAIME = 0.030 (sec) 

7. 
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35.5 Procedure 2 for Dependencies 

Suppose it is necessary to check several kinds of dependencies in a NIS. 
In Procedure 1, the CON and DEC must be specified in Step 1. So, it 
is necessary to do a sequence from Step 1 to Step 4 for each dependency. 
To make matters worse, Step 3 is time-consuming. In such a situation, we 
revised Procedure I to Procedure 2. In Procedure 2, a merging algorithm for 
equivalence relations is employed. Suppose eq{Ai) and eq{A 2 ) be equivalence 
relations for Ai, A 2 {C AT), respectively. The equivalence relation eq{Ai\jA 2 ) 
is {M C OB\M = Lif] 0) for some Li G eq{Ai) and L 2 G eq{A 2 )}. 
Namely, an equivalence relation for any set of attributes can be produced 
from eq{a){a G AT). 

Procedure 2 

(EStep 1) Prepare data file. 

(EStep 2) Translate them to internal expressions for each attribute. 
(EStep 3) Make all pe-relations for each attribute. 

(EStep 4) Fix condition attributes CON , decision attributes DEC, and 
produce all pe-relations for CON and DEC, respectively. 

(EStep 5) Calculate two criteria values by those relations. 

In this procedure, it is enough to execute EStep 2 and EStep 3 only once. 
It is enough to do EStep 4 and EStep5 for each pair of CON and DEC. 



35.6 Execution Time of Every Method 

Now, let us see the execution time of each method to calculate the degree 
of dependency. Four NISs in Table 2 are used, and the dependencies bet- 
ween {A,B,C} and {D} are calculated. Every method is implemented on a 
workstation with the 450MHz UltraSparc CPU by Prolog and C language. 



Table 35.2. Definitions of NISs 



NIS 


\OB\ 


\AT\ 


|Va./a|{ 0 ' ^ A.T) 


Derived-DISs 


NISi 


10 


4 


10 


864 


NIS2 


100 


4 


10 


1944 


NISs 


300 


4 


10 


3888 


NIS4 


1000 


4 


100 


7776 



According to Table 3, it is known that Step 3 in Procedure 1 and EStep 
3 in Procedure 2 are the most time-consuming. These two steps pick up pe- 
relations from internal expressions. The execution time of Step 4, EStep4 and 
EStep 5 are very small for the total execution time in Table 4. As for NISi 
and NIS 2 in Table 4, each execution time of the simple method for a DIS 
was 0.00 (sec). 

Suppose it is necessary to check five kinds of dependencies among attribu- 
tes in NIS4. In the simple method and Procedure 1, it is necessary to do all 
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Table 35.3. The execution time(sec) of Procedure 1 and 2 for checking the de- 
pendency {A, B, C} and {D}. Step 2, Step 3, EStep 2 and EStep 3 are realized by 
Prolog, Step 4, EStep 4 and EStep 5 by C. 



NIS 


Step2 


StepS 


Step4: 


EStep2 


EStepS 


EStep4 


EStep5 


NISi 


0.05 


0.17 


0.06 


0.07 


0.16 


0.01 


0.05 


NIS 2 


0.48 


0.84 


0.07 


0.70 


2.07 


0.10 


0.12 


NIS 3 


2.60 


8.67 


0.07 


3.61 


5.96 


1.03 


1.00 


NIS 4 


26.57 


122.45 


0.14 


31.32 


45.70 


0.82 


1.37 



Table 35.4. The total execntion time(sec) of the simple method, Procedure 1 and 
2 for checking the dependency {A,B,C} and {D}. The 2nd column simple shows 
such a value as (execution time to calculate the degree of dependency in a derived 
D/S') X (the number of all derived DISs). 



NIS 


Total{Simple) 


Total{Procedurel) 


Total{Procedure2) 


NISi 


— 


0.28 


0.29 


NIS 2 


— 


1.39 


2.99 


NIS 3 


38.88 


11.34 


11.60 


NIS 4 


933.12 


149.16 


79.21 



steps. Therefore, it will take about 4665.60(=933. 12x5) (sec) by the simple 
method and about 745.80(=149.16x5)(sec) by Procedure 1, respectively. In 
Procedure 2, it is enough to do EStep 2 and EStep 3 only once. It is enough 
to repeat the EStep 4 and EStep 5 for 5 times. In this case, it will take about 
87.97 (=31.32-b45.70-b5x(0.82-bl.37))(sec). 



35.7 Concluding Remarks 

This paper proposed a dependency in non-deterministic information systems 
and two procedures for calculating this new dependency. We conclude that 
for checking a dependency in small size data like NISi and NIS 2 , every three 
method will he applicable. However for large size data like NIS 4 , Procedure 1 
and 2 are applicable. It will he hard to apply the simple method. For checking 
several kinds of dependencies, Procedure 2 is much better than Procedure 1. 
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Abstract. In this paper, we suggest a method of data extraction for con- 
structing the speech recognition system. The proposed algorithm is based 
on the Extended Simulated Annealing(ESA) algorithm. We have used Ko- 
rean text data, drawn randomly from the internet. The Korean LDS built by 
the proposed algorithm has the equiprobable distribution among Korean 
alphabets. 



36.1. Introduction 

The speech recognition systems are trained using the learning data set that is col- 
lected or extracted from appropriate data bank. Up to now, however, we have no 
criterion whether the learning data set is proper for the speech recognition system 
that we build. Worse, it is hard to train the speech recognition system as the num- 
ber of the training data is increasing. To make the training effective, we need 
enough training data so that the recognition system does not rely on some specific 
words and alphabets. The suitable training data are required in order to set up the 
module with the high reliability. We believe that a right training data should be 
such that each alphabet be manifested equiprobably. We propose a method of ex- 
tracting LDS(learning data set) that has the equiprobability in the pattern domain 
with as few elements as possible. 

36.2. Domain Definition for LDS Extraction 



A Korean character is composed of 19 ini- 
tial sounds, 21 medial vowels and 27 final 
consonant. Here, The final consonant is 
omissible(FILL), so we can use total 28 
characters as final consonants. Table 1 
shows Korean alphabets. We collected the 
candidate data at random in the internet for 
extracting the LDS. 



Table 1. Korean alphabet table 



□ 








□ 


ggf.T.l 








“1 






fa 




Inn 


u 


q: 


n 


H 


-1 


IB 




1 4 1 


•• 


D 


u 




11 


JO 


gw 


SB 


X3 


3 


c 


rl 


V 


17 






M 


4 


u 




u 


18 






lU 


S 


9 




lA 


19 


BE 


-I 


A. 


0 


C 


1 




20 




1 


xt. 


7 


H 


t1 


C 


21 






o 


« 


•d 


Jl 




EH 








0 








ED 






K 


EQ 




4 




ED 








ED 


o 




«• 


ED 






■ 


m 








O 






a 


ED 




T 


*« 


ED 
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The weight of a word for extraction is related to the appearance frequency of a 
word and the entropy of Korean alphabetic distribution in those words. If a word 
occurrence is more often than others in the candidate data set, then we select it as 
a "molecule” of learning data set and our algorithm tries to get the highest entropy 
value of Korean alphabet distribution. 



36.3. The Numerical Formula for LDS Extraction 

Benefit (B) = a * jN fi - P * E 

i j 

jU\ : weight of occurrence frequency 
jX 2 : weight of the length of word 

N : the number of words having the same occurrence frequency 
M : the number of words having the same length 
E : the entropy of lerning dataset in Korean aplphabet s 

E = -J^P{kr-Log,,P(k) 

Benefit(B) is the criterion that we want to maximize. Learning data set consists of 
the words which make the set have maximum B. jU 1 is the weight of a word. It is 
in proportional to the appearance frequency in the text. For example, in the 
whole data, if the word '4i3. [hakgyo (school)] is repeated 2 times and the word 
[hakseang(student)] is repeated 1 time, we select the word '4iil to include 
in the LDS. JU 2 is used as a parameter to select the proper length of word. 
Generally, short words appear recurrently in the Korean text. If we divide words 
by the blank in the text, the letters like ' t', 's’', '^' etc. appears 

more often than other words. 

In the experiment, we set the average length of word is 3 and in that case it has the 
biggest ^ 2 . If the word is longer or shorter than 3-letter-length, it is selected in 
the candidate data set with low probability. Figl and Fig2 represent the weights 
of ^ 1 and jl 2 used in the experiment. 




Fig. 1. jl 1 Fig. 2. jd 2 

M shows the number of words that have the same occurrence-frequency and N 
shows the number of words that have the same length. If the word '4i3. appears 
3 times in the whole data, then ^ 1 is to be 1.75 as Fig 1. Likewise, if the words 
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with jl 1 values equal to 1.75 appear 100 times in the LDS, then N is set to 100. 
jl2 of the words with length 3, such as jg jgj- jq \ 

M is the number of words with jl 2 value set to 1 in that case. The constants 
<2, /^determine the importance between two terms, k is the index of Korean al- 
phabets and is the probability of Korean alphabet /C in the whole text data. There- 
fore E shows the entropy of Korean alphabet in the selected words. 

As a whole, the final result of LDS consists of the words with the appropriate 
length and also with the frequently appeared words and the set has the maximum 
entropy value. 



36.4. The Algorithm for Extraction of LDS 



The pre-processing is done; 

1 . Divide the whole text data into the unit(word) that we intend to recognize. 

2. Remove the repeated words and calculate the values of ^1, ^2 

3. calculate the occurrence of each alphabet in the whole words. 

The data set extraction algorithm based on ESA: 

T(o) = initial temperature(rfoj> = 7’f/j>=7l!'/)) 

T(f) = final temperature 
D = annealing schedule parameters close to 1. 

LDSo = a stable learning data set, it starts with the initial LDS. 

LDSn = a perturbed LDSo 



Annealingi ) 


LDSn = LDSo + wl-w2 


1 


break 


do 1 


1 


do { 


DecideAcceptOfLDSn( ) 


switch (Random[0,l,2]) 


jwhile (unstable state) 


1 


T(i)*=D 


case 0 : 


1 while(T > F) 


select a word w outside of LDSo 


1 


LDSn = LDSo + w 


DecideAcceptOfLDSn( ) 


break 


1 


case I : 


calculate /! B=Bnefit of LDSn - Bnefit of LDSo 


select a word w in the LDSo 


if{(/IB> 0.0) LDSo = LDSn; 


LDSn = LDSo - w 


else-ifexp( - /I BfTl i))<Random[ 0,1]] 


break 


LDSo = LDSn; 


case 2 : 


else 


select a word wl outside of LDSo 


Reject the LDSn; 


select a word w2 in of LDSo 


1 
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The LDS extraction Algorithm is founded on ESA[Wdlee,1997]. We obtain the 
candidate data set through the pre-process as follows. After making candidate data 
set, we choose some arbitrary words in the set to build the initial LDS and com- 
pute its initial benefit(B). 



36.5. Experimental and Result 

The data used in the experiment are 16871 words in total after removing the re- 
peated words, collected in the internet at random. In the experiment, the benefit of 
data set does not depend much on the values of ^^and 0. But we can get the 
maximum benefit when the number of the words is about 4950. The following Fig 
3 represents the change of Energy(-Benefit) with time when 4219 randomly cho- 
sen words are used to set up the initial data set. Finally, the algorithm extracts 
4946 words when the value of ^^is 0.1 and 0V& 100. 

In the figure 3, the energy is decreasing and finally converges to a value after 
some iteration. 

Although we change the initial data set to include more than 1000 words or less 
than 1000 words, the final result does not change that much. 




Fig. 3. The change of -Benefit 




Fig. 4. Alphabets appeared in initial data Fig. 5. Alphabet after final calculation in 

the LDS 



Fig 4 represents the total number of alphabets in the candidate LDS. Line 1 repre- 
sents initial sound alphabets, line 2 represent the medial vowels sound alphabets 
and the line3 represents the final consonant alphabets. In Fig 4, (the first initial 
sound) appeared more than 30000 times and some alphabets didn’t appear in the 
initial data set. When the algorithm is applied to such an initial data set, the oc- 
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curred less than 8000 times and the data set becomes equiprobable on each alpha- 
bet We can get a training data set with a minimal number of words with this algo- 
rithm, thereby saving the time to train the recognition system with a large number 
of words. The system protects the recognition system from being trained by some 
specific sound or syllable and thus can maximize its reliability. 



36.6. Conclusion 

In the experiment, aand /^values are set by trial and error. To get the aand /9, 
we experiment on the possible combinations about 5 times in the same condition. 
And the proposed algorithm is found to be sensitive to those values. Some Korean 
alphabets are more often than others in the usual Korean text, but the algorithm 
can reduce this unbalance. The learning data set(LDS), extracted by the proposed 
algorithm has the regular distribution in the domain of Korean alphabets. The pro- 
posed algorithm makes improvement in the reliability through the reformation of 
the speech recognition system. 
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Standard rough sets are defined by a partition induced by an equivalence re- 
lation representing discernibility of elements. Equivalence relations may not 
provide a realistic view of relationships between elements in real-world ap- 
plications. One may use coverings of, or non-equivalence relations on, the 
universe. In this paper, the notion of weak fuzzy similarity relations, a ge- 
neralization of fuzzy similarity relations, is used to provide a more realistic 
description of relationships between elements. A special type of weak fuzzy 
similarity relations called conditional probability relation is discussed. Gene- 
ralized rough set approximations are proposed by using a-coverings of the 
universe induced by conditional probability relations. 



37.1 Introduction 

The theory of rough sets plays essential roles in many applications of data mi- 
ning and knowledge discovery [37.6]. It offers a mathematical model and tools 
for discovering hidden patterns in data, recognizing partial or total depen- 
dencies in data bases, removing redundant data, and many others [37.4, 37.6]. 
Rough set theory generalizes classical set theory by offering an alternative 
formulation of sets with imprecise boundaries. A rough set may be viewed 
as an approximate representation of a given crisp set in terms of two subsets 
derived from a partition on the universal set [37.3]. The two subsets are called 
a lower approximation and an upper approximation. 

Although rough set theory built on equivalence relation has the advantage 
of being easy to analyze, it may not be a widely applicable model as equi- 
valence relations may not provide a realistic view of relationships between 
elements in real world. Coverings of, non-equivalence relations, on the uni- 
verse may be used to provide a more realistic model of rough sets. A covering 
of the universe, C = {Ci, ..., G„}, is a family of subset of non-empty universe 
U such that U = {}{Ci ] z = 1, ..., n}. The sets in C{x) may describe different 
types or various degrees of similarity between elements of U . The interpreta- 
tion and construction of subsets in a covering are some of the fundamental 
issues of covering based formulation of rough set theory. Crisp and fuzzy 
binary relations may be used for such purposes. In general, relationships bet- 
ween elements in real-world applications may not necessarily be symmetric 
or transitive. Recently, conditional probability relations [37.1] was introduced 
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for representing such non-equivalence relationships between elements. Con- 
ditional probability relations may be considered as a generalization of fuzzy 
similarity relations. 

The main objective of this paper is to generalize the standard concept 
of rough sets by coverings of the universe. Conditional probability relations 
are used in the construction of coverings. Rough set approximations are in- 
troduced based on a-coverings of the universe induced by the a-cuts of a 
conditional probability relation. The proposed rough sets may be considered 
as generalized fuzzy rough sets [37.7]. 



37.2 Conditional Probability Relations 

The concept of conditional probability relations was introduced in the con- 
text of fuzzy relational databases [37.1]. It may be considered as a concrete 
example of weak fuzzy similarity relotzon, which in turn is a special type of 
fuzzy binary relation. 

Definition 37.2.1. A fuzzy similarity relation is a mapping, s : UxU ^ 
[0, 1], such that for x,y,z G U , 

(a) Reflexivity : s{x,x) = 1, 

(b) Symmetry : s{x,y) = s{y,x), 

(c) Max— min transitivity : s(x,z) > maxmin[s(a;, y), s(y, z)]. 

y&u 

Definition 37.2.2. A weak fuzzy similarity relation is a mapping, s : 
U X U ^ [0,1], such that for x,y,z €. U, 

(a) Reflexivity : s{x,x) = 1, 

(b) Conditional symmetry : if s{x,y) > 0 then s(y,x) > 0, 

(c) Conditional transitivity : if s{x,y) > s{y,x) > 0 and 

s{y,z) > s{z,y) > 0 then s(x,z) > s(z,x). 

Definition 37.2.3. A conditional probability relation is a mapping, R : 
[/ X {7 ^ [0, 1], such that for x,y € U, 

R{x,y) = P(a; | y) = P{y ^ x) = 

\y\ 

where R{x,y) means the degree y supports x or the degree y is similar to x. 

By definition, a fuzzy similarity relation is regarded as a special case (or 
type) of weak fuzzy similarity relation, and a conditional probability relation 
is an example of weak fuzzy similarity relations. The conditional probability 
relations may be used as a basis of representing degree of similarity relati- 
onships between elements in the universe U . In the definition of conditional 
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probability relations, the probability values may be estimated based on the 
semantical relationships between elements by using the epistemological or 
subjective view of probability theory. When objects in U are represented by 
sets of features or attributes as in the case of binary information tables, we 
have a simple procedure for estimating the conditional probability relation 
as shown in Definition 37.2.3, where | • | denotes the cardinality of a set. 

The notion of binary information tables can be easily generalized to fuzzy 
information tables by allowing a number in the unit interval [0, 1] for each 
cell of the table. The number is the degree to which an element has a parti- 
cular attribute. Each object is represented as a fuzzy set of attributes. The 
degree of similarity two objects can be calculated by a conditional probability 
relation on fuzzy sets [37.1, 37.2]. In this case, jxj = where /ix 

is membership function of x over a set of attribute At, and intersection is 
defined by minimum. 



Definition 37.2.4. Let jix and fj,y be two fuzzy sets over a set of attribute 
At for two elements x and y of a universe of objects U. A fuzzy conditional 
probability relation is defined by: 



R{x,y) 



XaeAt min{fix{a),tiy{a)} 
J2aeAt Ly{a) 



It can be easily verified that R satisfies properties of a weak fuzzy simi- 
larity relation. Additional properties of similarity as defined by conditional 
probability relations can be found in [37.1]. 



37.3 Generalized Rough Sets Approximation 

From weak fuzzy similarity relations and conditional probability relations, co- 
verings of the universe can be defined and interpreted. The standard concept 
of rough sets can thus be generalized based on coverings of universe. 

Definition 37.3.1. Let U be a non-empty universe, and R be a conditional 
probability relation on U. For any element x G U, Rf{x) and Rp{x) are 
defined as the set of elements that support x and the set of elements that are 
supported by x, respectively, to a degree of at least a G [0, 1], as follows: 

= {y \ R{x,v) > a}, Rp{x) = {y GU \ R{y,x) > a}. 

The set Rf{x) can also be interpreted as consisting of elements that are 
similar to x, while Rp(x) consisting of elements to which x is similar. By 
the reflexivity, it follows that we can construct two covering of the universe, 
{Rf{x) \ X G U} and | x G U}. By extending standard rough sets, 

we obtain two pairs of generalized rough set approximations. 
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Definition 37.3.2. For a subset A C U , we define two pairs of generalized 
rough set approximations: 

(i) element- oriented generalization: 

L<f{A) = {x&U\ Rf{x) C A}, 

Ufi{A) = {x^U\ Rf{x) n ^ 7 ^ 0 }. 

(ii) similarity-class-oriented generalization: 

Lf{A) = I Rfi^) QA,xG U}, 

Uf‘{A) = U{^?(a;) I n A 7 ^ 0 ,x G C/}. 

In Definition 37.3.2(i), the lower approximation consists of those elements 
in U whose similarity classes are contained in A. The upper approximation 
consists of those elements whose similarity classes overlap with A. In Defini- 
tion 37.3.2(ii), the lower approximation is the union of all similarity classes 
that are contained in A. The upper approximation is the union of all similarity 
classes that overlap with A. Relationships among the these approximations 
can be represented by: 

L^{A)CL^{A)CACU^{A)CUfi{A). 

The difference between lower and upper approximations is the boundary re- 
gion with respect to A: 

Bnd:(Gl) = U:{A) - L:{A), Bnd:(Gl) = Ufi{A) - L^{A). 

Similarly, one can define rough set approximations based on the covering 

{R“(x) I X G [/}. 

The pair (T“, C/“) may be interpreted as a pair of set-theoretic operators 
on subset of the universe. It is referred to as rough set approximation ope- 
rators [37.8]. By combining with other set-theoretic operators such as U, 
and n, we have the following results: 



(reO) Lf{A) = ^U^^A), 

U^{A)=^Lf{^A), 

(rel) Lf{A)CACU^{A), 

(re2) L“(0) = [/“(0) = 0, 

(re3) Lf {U) = Ufi{U) = U, 

(red) Lf {A n B) = Lf{A) n Lf{B), 
Uf{A n R) c C/“(A) n c/“(B), 



(re5) L^{A U B) A Lf{A) U Lf{B), 
C/“(GlUR) = C/“(Gl)UC/“(B), 
(re6) A^%^ U^{A) = U, 

(re7) AcU^ L°{A) = 0, 

(re8) a < (3^ [U^{A) C U^{A), 

(re9) AC B^ [U^{A) C Uf{B), 

L“(A)CL“(R)]. 



(reO) shows that lower and upper approximations are dual operators with 
respect to set complement (re2) and (re3) provide two boundary conditi- 
ons. (red) and (re5) may be considered as weak distributive and distributive 
over set intersection and union, respectively. When a = 0, (re6) and (re7) 
show that lower and upper approximations of a non-empty set A C U are 
equal to U and 0, respectively. (re8) shows that if the value of a is larger 
then the lower approximation is also bigger, but the upper approximation is 
smaller. (re9) indicates the consistency of inclusive sets. 

Lower and upper approximations of Definition 37.3.2(ii), the pair (L“, C/“), 
satisfy the following properties: 
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(rcO) L“(^) CACU^{A), 

(rcl) L“(0) = [/“(0) = 0, 

(rc2) L“(C/) = U^{U) = U, 

(rc3) n B) C L“(A) n 

C/“(A n B) c c/“(A) n U^{B), 
(rc4) L“(t1 U B) D L“(A) U 

C/“(A U B) = C/“(A) U C/“(B), 



It should be pointed out that they art 
perty (rc5) indicates that the results c 
and upper approximation operators are 



(rc5) L^{A) = L“(L“(A)), 

U^{A) = L^m(A)), 

(rc6) A^%^ U°{A) = U, 

(rc7) AcU^ L^{A) = 0, 

(rc8) a < (3^ [C/f (^) C U^{A), 
L^{A) C L^^{A)], 

(rc9) AC C U^{B), 

L“(A)CL“(B)]. 

; not a pair of dual operators. Pro- 
f iterative operations of both lower 
the same a single application. 



37.4 Conclusions 

In this paper, we introduce the notion of weak fuzzy similarity relations. 
Two examples of such relations, conditional probability relations and fuzzy 
conditional probability relations, are suggested for the construction and in- 
terpreting coverings of the universe. Based on such coverings, we generalize 
the standard rough set approximations. Two pairs of lower and upper appro- 
ximation operators are suggested and studied. Their properties are examined. 
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Many real world problems deal with ordering of objects instead of classifying 
objects, although majority of research in machine learning and data mining 
has been focused on the latter. In this paper, we formulate the problem of 
mining ordering rules as finding association between orderings of attribute 
values and the overall ordering of objects. An example of ordering rules may 
state that “if the value of an object x on an attribute a is ordered ahead of the 
value of another object y on the same attribute, then x is ordered ahead of 
y" . For mining ordering rules, the notion of information tables is generalized 
to ordered information tables by adding order relations on attribute values. 
Such a table can be transformed into a binary information table, on which 
any standard data mining algorithm can be used. 



38.1 Introduction 

In real world situations, we may be faced with many problems that are not 
simply classification [38.1, 38.4]. One such type of problems is the ordering 
of objects. Two familiar examples of ordering problems are the ranking of 
universities and the ranking of the consumer products produced by different 
manufactures. In both examples, we have a set of attributes that are used to 
describe the objects under consideration, and an overall ranking of objects. 
Consider the example of ranking consumer products. Attributes may be the 
price of the products, warranty of the products, and other information. The 
values of a particular attribute, say the price, naturally induce an ordering 
of objects. The overall ranking of products may be produced by the market 
shares of different manufactures. The orderings of objects by attribute values 
may not necessarily be the same as the overall ordering of objects. 

The problem of mining ordering rules can be stated as follows. There is a 
set of objects described by a set of attributes. There is an ordering on values 
of each attribute, and there is also an overall ordering of objects. The overall 
ordering may be given by experts or obtained from other information, either 
dependent or independent of the orderings of objects according to their attri- 
bute values. We are interested in mining the association between the overall 
ordering and the individual orderings induced by different attributes. More 
specifically, we want to derive ordering rules exemplified by the statement 
that “if the value of an object x on an attribute a is ordered ahead of the va- 
lue of another object y on the same attribute, then x is ordered ahead of y” . 
In this setting, a number of important issues arise. It would be interesting to 
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know which attributes play more important roles in determining the overall 
ordering, and which attributes do not contribute at all to the overall ordering. 
It would also be useful to know which subset of attributes would be sufficient 
to determine the overall ordering. The dependency information of attributes 
may also be valuable. 

For mining ordering rules, we first introduce the notion of ordered infor- 
mation tables as a generalization of information tables. We then transform 
an ordered information table into a binary information table, on which any 
standard data mining and machine learning algorithms can be applied. Typi- 
cally, an ordering rule may not be exact. In order to capture the uncertainty 
associated with ordering rules, two quantitative measures are used. They are 
the accuracy and the coverage of the rules [38.5, 38.7]. The former deals with 
the correctness of the rules, and the latter represents the extent to which the 
rule covers the positive instances. 

Ordered information tables are related to ordinal information systems 
proposed and studied by Iwinski [38.3]. Mining ordering rules has been stu- 
died by Greco, Matarazzo and Slowinski [38.2]. Based on these studies, the 
main objective of the present paper is to precisely define and formulate the 
problem of mining ordering rules. 



38.2 Ordered Information Tables 

Formally, an ordered information table is defined by: 

OIT={U,At,{Va \aeAt},{Ia \ a £ At},{^a\ a eAt}), 
where 



C/ is a finite nonempty set of objects. 

At is a finite nonempty set of attributes, 

Va is a nonempty set of values for a £ At, 
la '■ U Va is an information function, 

C Mo X Va is an order relation on Va- 

Each information function la is a total function that maps an object of U to 
exactly one value in Va- An ordered information table can be conveniently 
given in a tabular form, the rows correspond to objects of the universe, the 
columns correspond to a set of attributes, and each cell is the value of an 
object with respect to an attribute. The order relations can be interpreted as 
additional semantics information about the table. 

An order relation should satisfy certain conditions. We consider the fol- 
lowing two properties [38.6]: 

Asymmetry : x >- y ^{y >- x), 

Negative transitivity : [^(x y),^{y >- z)] ~^{x >- z). 
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An order relation satisfying these properties is called a weak order. An im- 
portant implication of a weak order is that the following relation, 

X Y y),^{y Y x)], (38.1) 

is an equivalence relation. For two elements, if a; ~ y we say x and y are 
indiscernible by Y . The equivalence relation ~ induces a partition [//~ on 
U, and an order relation on [7/~ can be defined by: 

Y* [y]^ X Y y, (38.2) 

where [x].^ is the equivalence class containing x. Moreover, is a linear 
order [38.6]. Any two distinct equivalence classes of C//~ can be compared. 
It is therefore possible to arrange the elements into levels, with each level 
consisting of indiscernible elements defined by Y. For a weak order, ^(x Y y) 
can be written as y Y x or x ^ y, which means y Y x or y ~ x. For any two 
elements x and y, we have either x Y y or y Y x, but not both. 

We assume that all order relations are weak orders. An order relation on 
values of an attribute a naturally induces an ordering of objects: 

X ^{o} y Ia{x) Ya Ia{y), (38.3) 

where Y{o} denotes an order relation on U induced by the attribute a. An 
object X is ranked ahead of another object y if and only if the value of x on 
the attribute a is ranked ahead of the value of y on a. The relation Y{o} has 
exactly the same properties as that of Ya- For simplicity, we also assume that 
there is a special attribute, called decision attribute. The ordering of objects 
by the decision attribute is denoted by Yo and is called the overall ordering 
of objects. For a subset of attributes A C At, we define: 

x>-Ay Va e A[Ia{x) Yo Ia{y)\ 

^ A ^ n • (38.4) 

a^A a^A 

That is, X is ranked ahead of y if and only if x is ranked ahead of y according 
to all attributes in A. 



38.3 Mining Ordering Rnles 

With an ordered information table, we are interested in find ordering rules 
of the form (f> ^ ip, where p and ip are expressions regarding ordering of 
objects based on certain attributes. For an attribute a, we can construct two 
atomic expressions (a, y) and (a, ^). The former indicates that objects are 
ordered based on and the latter indicates that objects are ordered based 
on A set of expressions can be obtained from atomic expressions through 
the application of logic connectives A and V. Consider an ordering rule. 
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(a, A (6,^) ^ (c,^). 
It can be re-expressed as, 



X ^{o} 2 / A X ^{6} y^x ^{c} y, 



and paraphrased as follows. For two arbitrary objects x and y, if x is ranked 
ahead of y by attribute a, and at the same time, x is not ranked ahead of y 
by attribute b, then x is ranked ahead of y by attribute c. 

The meanings of expressions are defined by: 

(ml). m((a,^)) = {(x,y) €lJ xU \ x )^{o} y}, 

(m2). m((a,^)) = {(x,y) € U x U j x y}, 

(m3). m{-^<t>) = —m{4>), 

(m4). m{(j) A -ip) = f] m{'ip) , 

(m5). m{(p y p)) = rn{4>) \J m{'ip) . 

A pair (x, y) G m{4>) is said to satisfy the expression (p. In terms of the 

meanings of expressions, we can have many conditional probabilistic inter- 

pretations for ordering rules [38.7]. We choose to use two measures called 
accuracy and coverage, which are defined by [38.5]: 



accuracy{4> %p) 



\m{(j) A '0)1 

|m(0)| 



coverage{4> -0) 



] to (0 a 0)1 

|m(0)| 

(38.5) 



where j • j denotes the cardinality of a set. While the accuracy reflects the 
correctness of the rule, the coverage reflects the applicability of the rule. If 
accuracy{4> 0) = 1, the orderings by 0 would determine the orderings by 
0. We thus have a strong association between the two orderings. A smaller 
value of accuracy indicates a weaker association. An ordering rule with higher 
coverage suggests that ordering of more pairs of objects can be derived from 
the rule. The accuracy and coverage are not independent of each other, as 
both are related to the quantity ]m(0 A 0)]. It is desirable for a rule to 
be accurate as well as to have a high degree of coverage. In general, one 
may observe a trade-off between accuracy and coverage. A rule with higher 
coverage may have a lower accuracy, while a rule with higher accuracy may 
have a lower coverage. 

From an ordered information table, we can construct a binary information 
table. We consider all pairs of objects which are the Cartesian product UxU. 
The information function is defined by: 



Ia{x,y) 



1, X ^{o} y, 

0, X ^{a} y. 



(38.6) 



The value 1 corresponds to the atomic expression (a, a) and the value 0 corre- 
sponds to the atomic expression {a, A). Statements in an ordered information 
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table can be translated into equivalent statements in the binary information 
table, and vice versa. For example, a pair (a;, y) satisfies the expression (o, y) 
if and only if it satisfies an expression /a (a;, y) = 1. In other words, the state- 
ment X Y{a} y can be translated into an equivalent statement Ia{x, y) = 1. In 
the translation process, we will not consider object pairs of the form (x,x), 
as we are not interested in them. 

The interpretation of an ordered information table and the translation 
to a binary information table are crucial for mining ordering rules. Once 
we obtain the binary information table, any standard machine learning and 
data mining algorithms can be used to mine ordering rules. One may also 
use other types of translation methods. For example, we may consider two 
strict order relations Y and instead of and Alternatively, one may 
translate an ordered information table into a three- valued information table, 
corresponding to and It is important to realized that the framework 

presented in this paper can be easily applied with very simple modification. 



38.4 Conclusion 

Ordering of objects is a fundamental issue in human decision making and may 
play a significant role in the design of intelligent information systems. This 
problem is considered from the perspective of data mining. The commonly 
used attribute value approaches are extended by introducing order relations 
on attribute values. Mining ordering rules is formulated as the process of 
finding associations between orderings on attribute values and the overall 
ordering of objects. These ordering rules tell us, or explain, how objects 
should be ranked according to orderings on their attribute values. 

Our main contribution is the formulation of the problem of mining or- 
dering rules, and the translation of the problem to existing data mining 
problems. Consequently, one can directly apply any existing data mining 
algorithms for mining ordering rules. Depending on the specific problem, one 
may use different translation methods. 
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Probability measures are well-defined ones that satisfy additivity. However, 
it is slightly tight because of its condition of additivity. Fuzzy measures that 
do not satisfy additivity have been proposed as the substitute measures. The 
only belief function involves a density function among them. In this paper, 
we propose two density functions by extending values of probability functions 
to interval values, which do not satisfy additivity. According to the definition 
of interval probability functions, lower and upper probabilities are defined, 
respectively. A combination rule and a conditional probability can be defined 
well. The properties of the proposed measure are clarified. 



39.1 Introduction 

Probability theory is well defined for representing uncertainty under the as- 
sumption that a probability distribution is always determined from the given 
information. However this assumption is not satisfied with real situations in 
many problems. In the case where we can not determine only one probability 
distribution, it is appropriate that we speculate a set of probability distri- 
butions from an uncertain information given by estimators. There are many 
articles [39.2] [39.4] [39.6] [39.7] [39.8] [39.9] where an uncertain information 
has been handled by a set of distributions. These measures in the above pa- 
pers do not satisfy additivity that is an important role in the conventional 
probabilities. Non-additive measures can be said to be a kind of fuzzy measu- 
res [39.12]. Fuzzy measures have been dealt with distribution functions, but 
density functions are not discussed yet in non-additive measures except for 
belief functions [39.10]. Belief functions and random sets are different basi- 
cally from viewpoint of underlying theories. Nevertheless there is the method 
by which a belief function including the given random set can be obtained 
[39.11]. 
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In this paper, we propose two probability functions by extending values of 
probabilities to interval values, which do not satisfy additivity. This idea is 
similar to the concept of intuitionistic fuzzy sets [39.1] that can be said to 
be fuzzy rough sets [39.5]. According to the definition of interval probabili- 
ties, lower and upper probabilities are defined, respectively. A combination 
rule and a conditional probability can be defined well. The properties of the 
proposed measure are clarified. 



39.2 Interval Probability Functions 



In this paper, interval probability functions denoted as IPF are proposed by 
two density functions. IPF is an extension of probability values to interval 
probability values. 

Definition 1. The set of two functions denoted as is called IPF 

if and only if 

(а) Vx' e A, h*{x') > h^{x') > 0 

(б) ^ h*(a;) -b {h*{x') — h^{x')) < 1 

x^X 

(c) ^ h*{x) — {h*{x') — h^{x')) > 1. 

x£X 

The above (b) and (c) can be rewritten as 

(6') /i*(x) -b max (/i*(x') — < 1 

x' 

x^X 

(c') /i*(x) — max (/i*(x') — > 1. 

x' 

x^X 

Theorem 1. There exists a probability function h! {x) that satisfies 

< h!{x) < h*{x), h'{x) = 1. (39.1) 

x£X 

Two distribution functions can be defined by IPF as follows. Let the lower 
and upper functions be denoted as LB{-) and UB{-) respectively. 



Definition 2. LB{-) and UB{-) can be defined as 



LB{A) = min 

h' 



UB{A) = max 

h' 




where h^,{x) < h'{x) < h*{x). 



(39.2) 



(39.3) 
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From Definition 2, the following theorem holds. 
Theorem 2. For VT C X, we have 



LB{A) = ^ V 

x&A 



1 - 

x&A 



(39.4) 



UB{A) = V 

x&A 



1 - 



E 

x&A 






(39.5) 



Theorem 3. The functions LB and UB are superadditive and subadditive, 
respectively. 



The proof is omitted because of the limited space. It follows clearly from 
Theorem 2 that the following dual relation holds. 

LB{A) = 1-UB{A). (39.6) 



Let us consider the properties of IFF. 

Property 1. There is only one element such that its value of IFF is positive 
if and only if 

V a; G (X — {xi}), h*{x) = 0, /i*(xi) = h*{xi) = 1. (39.7) 

Property 2. There are only two elements such that these values of IFF are 
positive if and only if 

V X G (X - {xi,X 2 }), /i*(x) = 0, 

/i*(xi) + h*{x 2 ) = h*{x\) + ft.*(x 2 ) = 1. (39.8) 



Property 3. There is no element such that an interval value is positive 
{h*{x) — /i*(x)) > 0) if and only if it is a probability function. 

Property 4. There is no case such that only one element has an inter- 
val value. 

Property 5. There are only two elements such that interval values are 
positive (h*{x) — /i*(x)) > 0) if and only if 

V X G (X — {xi,X2}), ft.*(x) = h*{x), 
ft.*(xi) -I- h*{x2) = h*{xi) + /i*(x2) 

= 1 — ft.*(x) = 1 — h*{x). (39.9) 

x£{X — {x\,X2}) x£{X — {x\,X2}) 

These properties can be easily proved from the definition of IFF. 
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39.3 Combination and Conditional Rules for IPF 



Let us consider a combination rule to combine two interval probability func- 
tions into one probability function. 

Definition 3. Let two interval density functions be denoted as {hl<^{x) , hl{x)) 
and {h 2 *{x),h 2 {x)). Then the combination rule is defined as 

/ii2» = A li2* (39.10) 

h\^ = h\yh*2 (39.11) 

It is verified that the combined function {hi 2 *{x),h\ 2 {x)) is also IPF. This 
combination rule is proposed from viewpoint of possibility, although Demp- 
ster’s combination rule on belief functions [39.10] is defined from viewpoint 
of necessity. In belief measures, the combination rule entails the conditional 
rule, but in IPF the conditional rule is defined independently as follows. 



Definition 4. The lower and upper functions conditioned hy B C X are 
defined as 



LB{A\B) 

UB{A\B) 



LB{AB) 

LB{AB) + UB{B - AB) 
UB{AB) 

UB{AB) + LB{B - AB) 



(39.12) 

(39.13) 



where UB{B) yf 0 and for ^ we set LB{A\B) = 1 and UB{A\B) = 0. From 
the dual relation, we can see easily that LB{A\X) = LB{A) and UB{A\X) = 
UB{A). Using Definition 4, we can obtain two density functions as follows: 



/ii*(x) = LB{{x}\B) (39.14) 

hl(x) = UB{{x}\B) (39.15) 

where lii*(x) = = 0 for x € B. These two functions can be rewritten as 

follows: 



hi4x) = LB{{x}\B) = (39.16) 

ho* (x) ^ hp*(x) 

ho*(x) + 'E^,^B-M hU^') ho*(x) +J^^,^Bho*(x') 



h^x) = UB({x}lB) = (39.17) 

h^x) ^ h^(x) 

ho*{x) + Tlix'&B-ix} ho(x') ^o*(^) + ^x'eB ^o*{x') 
where (/iq*, /io) is ^ given IPF. 



Theorem 4. Two probability functions {hi*, hi) obtained by the above 
equations satisfy the definition of IPF. 
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Here the proof of Theorem 4 is skipped. Conditional probability functions 
are IFF. The lower and upper functions based on IFF defined the 

above are denoted as LBi{AB) and and UBi{AB) respectively. Then, we 
have the following relation. 

LBi{AB) < LBi{A\B) < UBi{A\B) < UBi{AB). (39.18) 

This means that the lower and upper functions obtained from probability fun- 
ctions induced by the conditional rule are wider than ones directly calculated 
by the conditional rule are. 



39.4 Concluding Remarks 

IFF is useful to obtain interval weights in AHF [39.12]. The definition of IFF 
can be regarded as an extension of normalization of conventional probabili- 
ties. This research work is a first step for interval probability functions, but 
there are many problems with respect to IFF for future study. 
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By a conflict profile we understand a set of data versions representing diffe- 
rent opinions on some matter, generated by agents functioning in some sites 
of a distributed system. In purpose to solve this conflict the management 
system should determine a proper version of data for this matter. The final 
data version is called a consensus of given conflict profile. The main subject 
of this paper consists of consideration of existence and reasonableness of po- 
tential consensus. In other words, we consider problems related to consensus 
susceptibility of conflict profiles. 



40.1 Introduction 

Consensus theory [1],[2] is useful in conflict solving. The resource of conflicts 
in distributed systems arises as the result of the autonomy feature of systems 
sites [3]. The simplest conflict takes place when two bodies have different 
opinions on the same subject. In work [5] Pawlak specifies the following ele- 
ments of an one-value conflict: a set of agents, a set of issues, and a set of 
opinions of these agents on these issues. The agents and the issues are related 
with one another in some social or political context. Information tables [6] 
should be useful for representing this kind of conflicts. In this paper we define 
a consensus system which represents multi-value conflicts. In this system we 
distinguish conflict profiles containing versions of data which are generated 
by different participants of a conflict and refer to a conflict subject. Next 
consensus for conflict profiles is defined and two problems of susceptibility to 
consensus for profiles are considered. 



40.2 Conflict Profiles 

For representing potential conflicts we use a finite set A of attributes and 
a set V of attribute elementary values, where V = 14 (14 is the do- 

main of attribute a). Let mVa) denote the set of subsets of set 14 and 
IHVb) = Let B C A, a tuple tb of type i? is a function 

tb ■ B ^ IKLb) where (V6 G ^)(r& 4 Vb). A tuple is elementary if all 
attribute values are empty sets or 1-element sets. Empty tuple is denoted 
by symbol </>. The set of all tuples of type B is denoted by TYP{B) and 
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the set all elementary tuples of type B is denoted by if — TYP{B). We 
assume that some real world is commonly considered by agents which are 
placed in sites of a distributed system. The interest of the agents consists of 
events which occur (or have to occur) in the world. The task of the agents 
is based on determining the values of event attributes (an event is described 
by an elementary tuple of some type). The consensus system is defined as 
a triple ConsensusSys = (A,X, P), where: A is a finite set of attributes, 
which includes a special attribute Agent; values of attribute a are subsets of 
Va, values of attribute Agent are 1-element sets, which identify the agents; 
X = m(K) : a G A} is a finite set of consensus carriers; P is a finite set of 
relations on carriers from X, each relation is of some type A (for A C A and 
Agent G A) . Relations belonging to set P are classified in such way that each 
of them includes relations representing similar events. For identifying relati- 
ons belonging to given group the symbols and should be used as the 
upper index. If P is the name of a group, then relation P~^ is called a positive 
relation (contains positive knowledge) and P~ is the negative relation (con- 
tains negative knowledge) . The structures of the consensus carriers is defined 
as a distance function between tuples of the same type. This function can be 
defined on the basis of one of distance functions and between sets of 
elementary values [4]. 

Definition 40.2.1. For 2 tuples r and r' of type A the distance function (p 
assigns a number d{r,r') = X) (p{ra,r'^) where ip G {p^,S^}. 

aeA 

Consensus is considered within a consensus situation, defined as follows: 
Definition 40.2.2. A consensus situation is a pair {{P~^ , P~}, A B) 
where A, B C A, AC\ B = % and for every tuple r G U P~ there should 
he held ua <f>- 

The first element of a consensus situation includes the domain from which 
consensus should be chosen, and the second element presents the subjects of 
consensus (i.e. set Subject(s) C E — TY P{A)) and the content of consensus, 
such that for a subject e there should be assigned only one tuple of type B. 
For each subject e 2 conflict profiles profile{e)~^, profile{e)~ C TYP{A\JB) 
should be determined. 

Definition 40.2.3. Consensus on subject e G Subject(s) is a set of 2 tuples 
{C(s,e) + , C{s,e)~} of type AU B which fulfill the following conditions: 

a) C{s,e)\ = C{s,e)^ = e, 

b) X d{rB,C{s,e)~^) and X d{rB,C{s,e)~j^) are minimal, 

r^profile{e)'^ r^profile{e)~ 

c) C{s,e)% n C(s,e)s = (p- 

Any tuples C{s,e)^ and C{s,e)~ satisfying conditions a)-b) are called con- 
sensuses of profiles profile{e)~^ and profile{e)~ respectively. 
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40.3 Susceptibility to Consensus 

In this section we investigate two problems referring to susceptibility to con- 
sensus for conflict profiles. For given situation s = {{P~^ B) there 
may exit two following problems: 

1. For given subject e G Subject(s) when any tuples C{s, e)'^ ,C{s, e)~ G 
TY P{A U B) satisfying conditions a)-b) specified in Definition 4 (that is 
being consensuses of profiles profile{e)'^ and profile{e)~ respectively) 
may create a consensus, that means when the last condition C{s,e)~^ H 
C(s,e)j) = (/> may be fulfilled? 

2. If consensus of a profile exists, is it good enough for this profile? 

For explaining the first problem we give the following example: 

Example 40.3.1. In the meteorological system [3] let us consider the follo- 
wing situation s = {RairA , Rain~ , Region Time) where determined 
from relations Rain~^ and Rain~ profiles profile(e)^ and profile{e)~ for 
e = {Region : ri) have the following form: 
profile{e)~^ profile{e)~ 



Agent 


Region 


Time 


Oi 


ri 


2a.m. -6a.m. 


02 


ri 


4a.m.-5a.m. 


03 


ri 


6a.m. -8a.m. 



Agent 


Region 


Time 


Ol 


n 


3a.m. -5a.m. 


02 


ri 


4a.m. -6a.m. 


03 


n 


4a.m. -5a.m. 



Using distance function p^ we obtain the following tuples C{s,e)^ and 
C{s,e)~ of type {Region, Time}, which fulfill conditions a)-b) of Definition 
4: C(s, e)“*' = {Region : ri,Time : 4a.m. — 5a.m.),C{s,e)~ = {Region : 
ri,Time : 4a.m. — 6a.m.}. Let us note that these tuples do not satisfy the 
condition c) of Definition 4, because C(s, n C(s, yf <j), thus 

they can not create a consensus for subject e. 

The second problem is the main subject of this section. It often happens 
that for a given conflict it is possible to determine consensus. The question is: 
is the chosen consensus good enough and can it be acceptable as the solution 
of given conflict situation? In other words, is the conflict situation suscep- 
tible to (good) consensus? We will consider the susceptibility to consensus 
for conflict profiles. Before defining the notion of susceptibility to consensus 
below we present an example. 

Example 40.3.2. Let a space {U,d) be defined as follows: U = {a,b} where 
a and b are tuples of some type, and distance function d is given as: For 
x,y G Ud{x,y) = Q x = y and d{x,y) = 1 otherwise. Let X be a profile 
being a set with repetitions, where X = {50 • a, 50 • b}. Assume that X 
represents the result of some voting, in which 100 agents take part, each of 
them gives one vote (for a or &). There are 50 votes for a and 50 votes for b. 
It is easy to note that for profile X the consensus should be equal to a or b, 
but it intuitively seems that none of them is a good consensus, because there 
is lack of a compromise in this conflict situation. Let us consider now another 
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profile X' = {50 • a,51 • b}. For this profile the only consensus should be b 
and it seems to be a good consensus, that means this profile is susceptible to 
consensus. 

The above example shows that although consensus may always be chosen 
for a conflict profile, it does not have to be a good one. We define below the 
notion of profile’ susceptibility to consensus. 

Let X S {prof ile{e)^ , prof ile{e)~} for e G Subject(s), card{X) = n and 

n 

Y. d{XB,VB) Y d{xB,VB) 

’ d{x,X) = , 

n(n +1) n 

dmin(X) = min d{x, X), dmax(X) = max d{x,X). 

xeTYP(B) xeTYP(B) 

Profile X is regular if for each x,y G X equality d{x,X) = d{y,X) 
follows, profile X is irregular if there exist two its elements x and y such that 
d{x,X)^d{y,X) . 

Definition 40.3.1. Profile X is susceptible to consensus iffd{X) > dmin{X). 

For the profiles defined in Example 2 we have d{X) = = ^ < 

dmin{X) = Thus profile X should not be susceptible to consensus. It is 
agreed with intuition because neither a nor b should be a ’’good consensus” 
for this profile. However d{X') = = dmin{X'). Thus profile 

X' should be susceptible to consensus. According to intuition b should be 
a ’’good” consensus because it dominates a in profile X'. Below we present 
some properties of consensus susceptibility. 

Theorem 40.3.1. If X is a regular profile and card{X) > 1 then X is not 
susceptible to consensus. 

Profile X in Example 2 is a regular one, therefore it is not susceptible to 
consensus. Let symbol U represent the sum operation on sets with repetitions, 
we have the following: 

Theorem 40.3.2. Let X and X' be such conflict profiles that card{X) > 1, 
X' = XU{a;} for some x G X and X is regular, then profile X' should be 
susceptible to consensus. 

Notice that profiles X and X' in Example 2 satisfy the conditions in Theorem 
2, so, as stated, X' should be susceptible to consensus. Theorem 2 shows also 
that if profile X is regular then its extending by some element of itself gives 
a profile which should be susceptible to consensus. The practical sense of this 
theorem is that if in given conflict situation none of votes dominates and in 
the second voting extended by one voter who gives his vote for one of the 
previous ones, then the new profile should be susceptible to consensus. 

For given conflict profile X , where X G {prof ile{eY , prof ile{e)~}, which 
elements are tuples of type B let Occ{X, x) for x G E — TYP{B) denote the 
number of occurrences of elementary tuple x in tuples belonging to X. Let 
card{X) = n and 
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M= 20 cc{X,y){n- Occ{X,y))py- 

yGE-TYP(B) 

X^ = {xgE-TYP{B):Occ{X,x) = ^}; Mi = 

X2 = {x£E-TYP{B)-.Q<Occ(X,x)<%}-, M2 = E Occ{X,y)py- 

y€X 2 

X3 = {xeE- TYP{B) : f < Occ{X,x) < n}; M3 = Y (n ~ Occ{X,y))py 

y^x^ 

for pj, = 1 if function is used and Py = d{y) if function 5^ is used, where 
d{y) is the cost of adding (moving) elementary tuple to (from) profile X. 
Theorem 40.3.3. If distance functions 5^ and are used for determining 
consensus then the following dependencies are true: 

a) If n is an odd number then profile X is always susceptible to consensus, 

b) If n is an even number then profile X is susceptible to consensus if and 
only if Ml + M 2 + M 3 < M/{n + 1). 

Theorem 3 allows to state if a given profile is susceptible to consensus or 
not without determining the consensus. It has been pointed out that if the 
number of agents taking part in the conflict is odd then the profile is always 
susceptible to consensus, and if this number is even then some condition must 
be satisfied. 



40.4 Conclusions 

In this paper some results of investigation on the problems related to spe- 
cifying conditions which allow to find out if a conflict profile is susceptible to 
consensus, are presented. The future work should concern the first problem 
specified in Section 3. Its solution should allow us to find out if a conflict situa- 
tion is consensus-oriented or not. Another interesting aspect of the consensus 
model is introducing probabilistic elements to conflict content to extend the 
possibilities for agents for their opinion representation. In this case the tools 
which enable to join rough set theory and probabilistic calculus [7] should be 
useful. 
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A method for extracting relevant information from image sequence data is 
presented. The image sequences, being output of video system of the Unman- 
ned Aerial Vehicle, are analysed with use of EM-clustering techniques and 
Rough Set based methods. The possibilities of construction of an automa- 
ted system for recognition/identification of cars on the road, on the basis of 
colour-related data are discussed. 



41.1 Introduction 

The issue of constructing and controlling an autonomous Unmanned Aerial 
Vehicle (UAV) is a multi-fold one. The idea of constructing such a vehicle 
(helicopter) for the purposes of traffic control drives the WITAS project (see 
[41.8]). Apart of difficulties in construction of proper hardware the problem 
of establishing software is a challenging one. The UAV is supposed to reco- 
gnise the road situation underneath on the basis of sensor readings and make 
the decision about acts that are to be performed. The issue of constructing 
adaptive, intelligent and versatile system for identification of situation was 
addressed in [41.5]. In the paper we focus on one of the subtasks necessary 
for the entire system to work - the problem of discerning between objects 
that are visible to the UAV. 

The most crucial information for UAV is provided by its video systems. 
We have to be able to provide UAV control system with information about 
car colors and so on. Such information may allow for making the identification 
that is core for operations performed by UAV, such as tracking a single vehicle 
over some time. 

In the paper we address only a part of issues that have to be resolved. The 
particular task we are dealing with is identification of techniques that may 
be used for the purpose of discerning and/or classifying objects from image 
sequence data. Given a series of images gathered by UAV’s video system we 
have to extract the valuable information about cars present in the image. 
The key is to have compact set of features that at the same time are robust 
the image data may be heavily distorted. The unwanted effects coming from 
changes in UAV’s position, lighting conditions, scaling, rotation and weather 
conditions have to be compensated. 
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41.2 Data Description 

At the current stage we are dealing with two sequences of images consisting 
of 100 frames each. They represent two situations on the road, each about 
4 second long. Every frame is a 24 bit .tiff image with resolution 726 x 512 
pixels. The image sequences have been manually interpreted. Altogether 18 
objects representing cars on the road have been identified The object instance 
(colour blob) is represented with 30 attributes, number (identifier) assigned 
to an object, two numerical attributes representing X and Y coordinates 
(within an image) of the center of colour blob (object) and 27 attributes 
representing coordinates in the RGB colour space for 9 pixels being a 3 x 3 
matrix surrounding the center of colour blob. For each of 18 identified object 
we have 100 instances, one for each image in sequence (1800 samples in total). 



41.3 The Task 

The overall problem of situation identification on the basis of image (and 
possibly some other) data is very compound. In the first stage, described in 
this paper we would like to find the answers to the following questions: 

1. Is the existing amount of information (27 colour-related attributes) suf- 
ficient for construction of classification support system that is able to 
distinguish between 18 pre-identified objects? 

2. Is it possible to transform the existing 27 dimensional attribute space to 
the form better supporting car colour classification tasks? 

3. Is it possible to learn the basic concepts allowing for establishment of 
prototypes rules of classification provided we have part of the sequence, 
say first 50 images, and then classify objects for the rest of sequence 
properly? 



41.4 The Method 

Initially, an attempt to perform car (colour blob) dissemination and/or clas- 
sification with use of typical methods from the Rough Set armory (see [41.4]) 
have been made. Unfortunately, it turned out that the data is too vague and 
distorted for the typical tools like RSES ([41.7]) or Rosetta ([41.6]). 

We came to the conclusion that some method for extraction of more rele- 
vant features from the raw data is needed. Therefore we turned our attention 
at unsupervised learning methods that allow for identification of characte- 
ristic features of objects in the corpus. The main intention is to eliminate 
unwanted effects caused by changes in object RGB colours as the object (car) 
moves between zones of different light. The particular approach we apply uses 
clustering and simple time series analysis. 
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First, we perform clustering treating all 1800 measurements as points in 
27 dimensional space (9 points x 3 RGB coordinates). To do the clustering 
we utilise Expectation Maximisation (EM) method. EM-clustering is an ite- 
rative, unsupervised clustering method aimed at establishment of possibly 
small number of not intersecting clusters constructed with assumptions ab- 
out normal distribution of objects. For details about EM clustering see [41.11 
and [41.3]. 

After the clustering have been found we recall the information about 
sequential character of our data. Namely, we analyse the sequences of cluster 
assignments for each of 18 cars. Going frame-by-frame we check to which 
clusters the object belong in scope of this frame. In this way for each of 18 
cars we get a vector of 100 cluster assignments. Such vectors may be compared 
and on the basis of differences between them we may discern one car from 
the others. 



41.5 Results 

The clustering was applied to the entire data. As a result of several experi- 
ments we got 15 to 18 clusters on the average. For all objects the assignment 
to cluster was very characteristic. In most cases it was possible to distinguish 
2-3 clusters to which the samples corresponding to the single cases were as- 
signed. These 2-3 clusters contained more than 80% of car on the average. 
Moreover, it was possible to correlate the change of cluster assignments with 
changes in lighting of car on the road. As the car enters the area of shadow, 
the visual perception of its colour is changing and so its cluster assignment. 
This effect is very welcome from our point of view since it makes clear evi- 
dence of cluster relevance. 

On the basis of clustering new features were constructed for the objects. 
For each object (car) C\ {i = 1, ..., 18) we construct new attributes nai, ..., noc 
where c is the number of the clusters derived. The value of attribute naj for 
the car Ci is the number of occurrences of an object representing i-th car 
in j-th cluster. So, if the value of attribute nai for car Ci is 20 then we 
know that an object corresponding to this car was assigned to first cluster 20 
times out of hundred. This new set of attributes undergone further analysis. 
By applying Rough Set based techniques it was possible to find out that 
attributes derived from clusters are sufficient for discernibilty. Namely, it was 
possible, with the use of RSES software (see [41.7]), to calculate a set of 
if ..then., decision rules classifying (discerning) the cars. In this way we got 
a simple set of rules such that there was exactly one rule for each of 18 cars. 

Since clustering have led us to so promising results in terms of ability for 
object dissemination, we tried to exploit its potential to the limit. Since the 
clustering process takes some time in case of 1800 objects and 27 numerical 
attributes we were looking for the way to make it simpler. Reduction of com- 
putational effort is in our case very important since major part of recognition 
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process has to performed on-line, during UAV operation. We found out that 
the clustering-based approach is quite powerful. We performed an experiment 
using reduced information about colour blobs. Instead of 27 attributes repre- 
senting three RGB coordinates of 9 points (3x3 matrix) we take only three. 
These three are averages over 9 points for Red, Green and Blue coordinate va- 
lues respectively. For this reduced set of features we obtained a clustering and 
it was still possible to have good discrernibility between objects. Moreover, 
the time needed for computation was reduced several times. 

The results presented above address the question about amount of useful 
information that can be retrieved from image sequences. The other question 
on our task list was the one about potential abilities for construction of 
classification system. 

Initial experiments aimed at construction of classification method based 
on inductive learning of concepts were performed. We wanted to check what 
are the possibilities to create a system that will be able to classify previously 
unseen objects as being similar to the prototypes learned during presentation 
of training sample. For this purpose we first split our set of examples into 
halves. One half, used for training, contains first 50 samples for each car i.e. 
frames 1 to 50 from both image sequences. The remaining 50 frames from 
each sequence form the dataset used for testing. On the basis of training set 
we establish clustering-based features and decision rules using these features. 
Then we take a sample from the testing set and label them with the car 
numbers. 

In the experiments we use simplified version of cluster-based attributes 
presented above. Instead of attributes nai, ...,nac for training samples we 
take binary attributes moi, ..., mac- Attribute mai for a given sample is equal 
to 1 iff nai > 0 for this sample, and 0 otherwise. 

Since we have to check abilities of classification system we start first with 
the learning phase. Learning of classification (decision) rules is done on the 
basis of 18 samples of 50 frames each. So the learning data consists of 18 
objects, each object described by c attribute values, where c is the number 
of clusters. 

First attempt was performed for testing samples consisting of entire 50 
remaining frames. By matching those examples against previously created 
clusters, producing cluster-based attributes and the assigning decisions (car 
numbers) to the samples we got the result for training sample. In this parti- 
cular experiment we got a perfect accuracy (100%). 

Unfortunately, taking 50 frames requires approximately two seconds which 
is too long for real-time application. Therefore, we would like to be able to 
reduce the number of frames in testing sample to no more than 15-20 and 
still retain good classification ratio. 

To do that we process our test data and produce testing samples with use 
of moving window. We set a size of the window to be some integer not greater 
than 50. Then from 50 frames we produce the testing sample by taking as 
many sequences of the size of window as possible and calculate cluster-related 
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attributes mai, mac for them. For instance, if we set the size of the window 
to be 15 then we will get 35 samples for each car. First of these samples will 
contain frames from 51 to 66 while the last will consist of frames 86 to 100. 
So, altogether for 18 cars we will get 830 testing instances. 

The key is now to find the size of the window to at the same time small 
enough to allow on-line classification and big enough to have good quality 
of this classification. From several attempts we have learned that with the 
methods of attribute generation and decision rule derivation depicted above, 
we are able to get perfect accuracy of classification for testing sample if the 
size of the window exceeding 17. For the window size less than 17 the accuracy 
decreases, being 89% and 78% for the windows of size 16 and 15, respectively. 
It is worth mentioning that these experiments are, at the moment of writing, 
only initially finished. We expect to improve the results by allowing more 
information to be passed to classifier e.g. by using the original attributes 
noi, ...,nOc instead of simplified mai, ...,mac- 



41.6 Conclusions 

The method for extracting information from image sequences was presented. 
It is based on combination of unsupervised clustering with Rough Set ba- 
sed approach. From the initial experiment we may see that this approach 
has a significant potential and may be further developed into complete so- 
lution. The proposed method have to be tuned to fit the requirements for 
co-operation with other components of UAV’s control system as well as ex- 
pectations about robustness, versatility and speed of operation. 

The natural next step is the application of developed solutions to other 
sets of image data. We expect that some further evolution of the methods 
will be necessary, since many problems may arise. We believe that with more 
data we will be able to generalise our approach using tools such as more 
compound time series analysis. 
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42.1 Introduction 

Inductive Logic Programming [42.1] is the research area formed at the in- 
tersection of logic programming and machine learning. Rough set theory 
[42.2, 42.3] defines an indiscernibility relation, where certain subsets of exam- 
ples cannot be distinguished. The gRS-ILP model [42.4] introduces a rough 
setting in Inductive Logic Programming and describes the situation where 
the background knowledge, declarative bias and evidence are such that it is 
not possible to induce any logic program from them that is able to distinguish 
between certain positive and negative examples. Any induced logic program 
will either cover both the positive and the negative examples in the group, or 
not cover the group at all, with both the positive and the negative examples 
in this group being left out. 

The Variable Precision Rough Set (VPRS) model [42.5] is a generalized 
model of rough sets that inherits all basic mathematical properties of the 
original rough set model but allows for a controlled degree of misclassification. 
The Variable Precision Rough Set Inductive Logic Programming (VPRSILP) 
model [42.6] extends the gRS-ILP model using features of the VPRS model. 

This paper applies the VPRSILP model to graphs, and presents the results 
of an illustrative experiment on web usage graphs. 



42.2 The VPRSILP Model and Web Usage Graphs 

The generic Rough Set Inductive Logic Programming (gRS-ILP) model intro- 
duces the basic definition of elementary sets in ILP [42.4, 42.7]. A parameter 
/?, a real number in the range (0.5, 1], is used in the VPRSILP model [42.6] as 
a threshold in elementary sets that have both positive and negative examples, 
to decide if that elementary set can be classified as positive or negative. A 
standard ILP algorithm GOLEM [42.8] is modified in [42.6] to fit this model. 
The formal definitions and the modified algorithm are omitted here due to 
space constraints. 
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42.2.1 A Simple Graph VPRSILP ESD System 

Let t/ be a universe of examples. Let a graph Gx be associated with every x 
in U. Let Gu = {Nu,Ejj) be a graph associated with the universe such that 
Gx, for every x G U, is a, subgraph of Gu. Nu is the set of nodes and Eu is 
the set of links of Gu- 

Definition 42.2.1. We define a simple-graph- VPRSILP-ESD system as a 
2-tuple S = {S', Gu), where: 

(1) Gu is a directed graph, and 

(2) S' = {E,B,L,j3) is a VPRSILP-ESD system [42.6] such that 

(i) E is the universe of examples consisting of a unary predicate, say p. 
Each example p{x) has a directed graph Gx associated with it which is a 

subgraph of Gu 

(ii) B is the background knowledge consisting of ground unit clauses, using 
the following predicate: 

edge ( of arity 3 ) where for any p{x) € E : 

edge{x, sourcenode, destnode) G B ^ the graph associated with the ex- 
ample p{x) has an edge from the sourcenode to destnode. 

(Hi) L is the declarative bias L = Lpi A Lrd A Le«, (defined in [42.7]) 

(iv) /3 is a real number in the range (0.5, 1]. 

Our aim is to find a hypothesis H such that 
P = B A H G VpiS), the /3-restricted program space [42.6]. 



42.2.2 Web Usage Graphs 

We now consider an example of a simple-graph- VPRSILP-ESD system using 
Web usage graphs. Let us consider a particular set of Web pages and links 
between them. Each node in Nu corresponds to one of these Web pages and 
each link in Eu corresponds to one of these links. A single session x when 
a user enters any Web page in Nu till the user finally leaves the set of Web 
pages in Nu is a subgraph of Gu = {Nu, Eu), denoted by Gx = {Nx, Ex). 

The universe U is considered to be the set of all such sessions. A session 
X G U is a positive example {x G A) or a negative example {x G {U — A)) 
based on some concept of interest (A). 

The notion of Posgraph and Neggraph (that cumulatively capture all 
known positive and negative sessions) is introduced in [42.9]. Using Posgraph 
(Neggraph), edges that are distinctly present in the positive (negative) ses- 
sions are obtained. These edges are considered to be important and used in 
predicting unknown sessions. 

Let Gu be the weighted directed graph representing the Web pages and 
links under consideration. Let E = E+ U E~ , E+ = {p{el),p{e2)}, 

E~ = {p(e3)}. Let B = {edge{el, ( (m.orgja, ) (m.orgjb), 
edge{el, / fm.org] c, / jm.org fd), edge{e2, f fm.org fc, f fm.org fd), 
edge{e3, f fm.org fb, f fm.orgfc)}. 
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S = {S' , Gu), where S' = {E, B, Lpi A Lrd A Lgu, P), is a simple-graph- 
VPRSILP-ESD system. One induced hypothesis H, such that P = H A B G 
Vi 3 {S), for P = 0.5, is of the form 

p(X) : -edge(X, //m.org/a, //m. org/b) , edge (X, //m.org/c, //m.org/d). 

It is seen that ior P = B A H, P \- p{el), P 1/ p{e2), P 1/ p(e3). 



42.3 Experimental Illustration 

The dataset used in our experiment is taken from the website 
http : //www . cs . Washington . edu/ research/ adaptive/ download . html and 
is the data set used in [42.10, 42.11]. The data pertains to web access logs at 
the site http://machines.hyperreal.org during the months of September 
and October 1997. Each day of the month has a separate file. Each file records 
all the requests for Web pages made to the Web server on that particular day. 
Sessions with less than 3 edges or more than 499 edges were not considered. 

The dataset (U) is divided into positive example sessions (X) and negative 
example sessions ([/ — X). As an illustration, all sessions that had an access 
from www.paia.com are treated as positive examples and all sessions that 
had access from www. synthzone . com as negative examples. 

The data is first preprocessed to determine a set of useful edges based 
on the number of positive and negative sessions traversing the edges. The 
universal graph Gu consists of these useful edges. 

Each elementary set corresponds to the set of examples whose session 
graphs have the same set of edges in Gu- The training set is used to determine 
the number of positive and negative examples in each elementary set. 

In the modified GOLEM algorithm [42.6] all elementary sets covered by 
any rule fall within the /3-positive region. A /3 value of 0.5 is used in this 
experiment. The two counters in each elementary set are used to calculate 
the conditional probability, and hence to determine whether the elementary 
set is in the /3-positive region or /3-negative region. 

The modified GOLEM algorithm is implemented with the following chan- 
ges. (l)For each session, the corresponding elementary set is determined based 
on which of the useful edges are traversed in that session. (2)The maximal 
common subgraph between example sessions is used instead of rlgg. (3) Every 
example is used rather than a random subset. (4) The innermost loop is not 
implemented, since every example is being considered. 

Ten fold cross validation was done by using days ending with 0, 1, 2, ... 
9 as the ten sets. The experiment consists of two separate runs. The first 
run uses the original positive and negative examples, whereas the second run 
uses the original negative examples as the positive examples, and the original 
positive examples as the negative examples. In other words, the positive and 
negative examples are inverted in the second run. The results of the ten fold 
cross validation in the original run (original positive and negative examples) 
are tabulated below. 
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Positives 


Negatives 


Serial 


Correct 


Wrong 


Correct 


Wrong 


0 


20 


33 


294 


0 


1 


24 


19 


134 


2 


2 


18 


32 


215 


0 


3 


24 


47 


285 


0 


4 


20 


42 


219 


1 


5 


34 


27 


209 


0 


6 


30 


32 


294 


1 


7 


18 


34 


192 


0 


8 


19 


35 


205 


0 


9 


29 


34 


320 


3 


Average 


23.6 


33.5 


236.7 


0.7 



The results of the ten fold cross validation in the inverted run are tabula- 
ted below. The original positive and negative examples are inverted and are 
used as negative and positive examples, respectively. 





Positives 


Negatives 


Serial 


Correct 


Wrong 


Correct 


Wrong 


0 


68 


226 


52 


1 


1 


33 


103 


42 


1 


2 


56 


159 


50 


0 


3 


83 


202 


71 


0 


4 


47 


173 


62 


0 


5 


62 


147 


60 


1 


6 


66 


229 


62 


0 


7 


58 


134 


51 


1 


8 


55 


150 


51 


3 


9 


63 


260 


62 


1 


Average 


59.1 


178.3 


56.3 


0.8 



The average results in the two runs are tabulated below. The following table 
is the average result of the ten-fold cross-validation on the original positive 
and negative examples. 





Pred. Pos 


Pred. Neg 


Actually Positive 


23.6 


33.5 


Actually Negative 


0.7 


236.7 



The following table is the average result of the ten-fold cross-validation 
on the inverted positive and negative examples. It is to be noted that the 
positives and negatives reported in the table are the original positives and 
negatives (i.e. reinverted from those used in the actual inverted run). 





Pred. Pos 


Pred. Neg 


Actually Positive 


56.3 


0.8 


Actually Negative 


178.3 


59.1 
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It is seen from the tables, that in the original run, if a test case is predicted 
positive, it has 97.1% chance of being positive; and in the inverted run, if a 
test case is predicted as original negative, it has 98.7% chance of being an 
original negative. (This high degree of accuracy of prediction applies to the 
41.3% of the positive test cases that are predicted positive and the 24.9% of 
the negative test cases that are predicted negative.) 



42.4 Conclusions 

The VPRSILP model is applied to Web usage graphs. An illustrative expe- 
riment on the prediction of Web usage sessions is presented. Possibilities for 
further work include the application of the VPRSILP model to other ILP 
algorithms and to other application areas. 
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AHP is proposed to give the importance grade with respect to many items. 
The comparison value is used to be crisp, however, it is easy for a decision 
maker to give it as an interval. The interval comparison values can reflect 
uncertainty due to human judgement. In this paper, the interval importance 
grade is obatained from an interval comparison matrix so as to include the 
decision maker’s judgement. To choose the crisp importance grades and the 
crisp efficinency in the decision maker’s judgement, we use DEA, which is an 
evaluation method from the optimistic viewpoint. 



43.1 Introduction 

AHP (Analytic Hierarchical Process) is proposed to determine the impor- 
tance grades of each item [43.1]. AHP is a method to deal with the impor- 
tance grades with respect to many items. In conventional AHP, the crisp 
importance grade of each item can be obtained by solving eigenvector pro- 
blem with a crisp comparison matrix. Since a decision maker’s judgement is 
uncertain and it is easier for him/her to give it as an interval value than to 
give it as a crisp value, we extend the crisp comparison values to intervals. 

Based on the idea that a comparison matrix is inconsistent due to human 
judgements, the model that gives the importance grade as an interval is pro- 
posed [43.2]. We take another way to obtain the interval importance grades 
based on eigenvector method and interval regression analysis [43.3]. When a 
decision maker gives comparison matrices for input and output items, the in- 
terval importance grades of input and output items are obtained respectively. 
The obtained interval importance grades can be considered as the acceptable 
importance grades for a decision maker. 

We choose the most optimistic importance grades for the analyzed object 
in the interval by DEA (Data Envelopment Analysis) [43.4] [43.5]. DEA is a 
well-known method to evaluate DMUs (Decision Making Units) from the op- 
timistic viewpoint. The weights in DEA and the importance grades through 
AHP are similar then DEA is used to choose the most optimistic impor- 
tance grades of input and output items in the decision maker’s acceptable 
ranges [43. 6]. Our aim is to choose the importance grades in a possible ranges 
which are estimated from a decision maker’s judgement. 
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43.2 Interval AHP with Interval Comparison Matrix 

When a decision maker compares a pair of items for all possible pairs with n 
items, Ii, we can obtain a comparison matrix A as follows. A decision 

maker’s judgement is usually uncertain. Therefore, it is more suitable to give 
the comparison values as intervals. 



^ 1 * * * [ aiyi, 

■■ 

V ••• 1 

where the element of matrix A, Oy], shows the importance grade of A 

obtained by comparing with Ij, the diagonal elements are equal to 1, that is 
^<iii an = 1 and the reciprocal property is satisfied, that is a ji. 

Then, we estimate the importance grade of item i, as an interval denoted 
as Wi, that is determined by its center wf and its radius di as follows. 

Wi = [^Wi, '^Wi\ = [w'i - di, Wi + di] 

In order to determine interval importance grades, we have two problems 
where one is to obtain the center and the other is to obtain the radius. 
The center is obtained by eigenvector method with the obtained comparison 
matrix A. Since the elements of A are intervals, their centers are used. The 
eigenvector problem is formulated as follows. 



Aw = Xw 



(43.1) 



Solving (43.1), the eigenvector (w ^, . . . , w°) for the principal eigenvalue 
Xmax is obtained as the center of the interval importance grade of each item. 
The center wf" is normalized to be ~ 1- 

The radius is obtained based on interval regression analysis, which is to 
find the estimated intervals to include the original data. In our problem, a^ 
is approximated as an interval ratio such that the following relation holds. 






C Eli = 



— w 



C * 7 C* I I 

—di -\-di 

3 3 



(43.2) 



where WijW j is defined as the maximum range. 

The interval importance grades are determined to include the interval 
comparison values. Using the obtained centers w^* by (43.1), the radius 
should be minimized subject to the constraint conditions that the relation 
(43.2) for all elements should be satisfied. 



s.t. 



min A 

Wj* - dj 

+ dj 

di ^ X, 



< ^a - ■ 

(Vi) 



lij < 



Wi 



di 



w- 



-di 



(V(*,j)) 



(43.3) 



The interval importance grade shows the acceptable range for a decision 
maker. 
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43.3 Choice of Optimistic Weights and Efficiency by 
DEA 



43.3.1 DEA with Normalized Data 

In DEA the maximum ratio of output data to input data is assumed as the 
efficiency which is calculated from the optimistic viewpoint for each DMU. 
The basic DEA model is formulated as follows. 



S.t. V*Xo = 1 

< 0 

u,v > 0 



(43.4) 



where the decision variables are the weight vectors u and v, X G 3?™^" and 
Y G are input and output matrices consisting of all input and output 

vectors that are all positive and the number of DMUs is n. 

In the conventional DEA as in (43.4), it is difficult to compare importance 
of input and output items to their weights, because the weights largely depend 
on the scales of the original data X and Y. Then we normalize the given 
input and output data based on DMUg so that the input and output weights 
represent the importance grades of the items. 

The normalized input and output denoted as xjp and obtained as 

follows. 



I.JP 



— ^jp/‘ 



jp/ ^opt 



y jr yjr/Vor 

The problem to obtain the efficiency with the normalized input and output 
are formulated as follows. 



= max({ti H + Mfc) 

ii 

s.t. hi + • • • + hm = 1 
-v*X + ii*Y < 0 
ii,v > 0 



(43.5) 



where X and Y are all the normalized data and u and v are the decision 
variables. The efficiency from the normalized input and output is equal to that 
from the original data by conventional DEA. The obtained weight represents 
the importance grade itself. Then we can use DEA with the normalized data 
to choose the optimistic weight in the interval importance grade obtained by 
a decision maker through interval AHP. 



43.3.2 Optimistic Importance Grades in Interval Importance 
Grades 

A decision maker gives comparison values for all pairs of input and output 
items and the comparison matrices for input and output items whose elements 

are obtained. 



are 



L ^in U 



■ and 

J L J 
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By the proposed interval AHP in 43.2, the importance grades of input 
and output items are denoted as follows. 

w;^ = u>“] , wr* = 

The optimistic or substitutional weights and efficiency are obtained by 
considering the interval importance grades through interval AHP as the 
weight constraints in DEA with the normalized data. By DEA, we can deter- 
mine the optimistic weights for DMUo in the possible ranges. The constraint 
conditions for the input and output weights are as follows, considering the 
difference between sums of centers of the interval importance grades and 
weights. 






< 



{ill 



Uk) 






W. 



U ^.,in 



p <Vp< W. 



(43.6) 



The problem to choose the most optimistic weights for DMUo in the 
decision maker’s judgement is formulated by adding (43.6) to (43.5) as the 
constraint conditions. Any optimal solutions are within the inteval impor- 
tance grades that are given by a decision maker based on his/her evaluation. 
As the character of DEA the optimal weights are obtained from the most op- 
timistic viewpoint for DMUo- Therefore both of a decision maker and DMUs 
are satisfied with the obtained evaluations. 



43.4 Numerical Example 

1-input and 4-output data of example DMUs(A,...,J) are shown in Table 43.1. 
The interval comparison matrix given by a decision maker and the interval 



Table 43.1. Data with 1-input and 4-output and efficiencies 





Xl 


2/1 


2/2 


2/3 


2/4 


DEA 


proposed model 


A 


1 


1 


8 


1 


3 


1.000 


0.925 


B 


1 


2 


3 


4 


4 


0.850 


0.703 


C 


1 


2 


6 


6 


1 


1.000 


0.670 


D 


1 


3 


3 


5 


5 


1.000 


0.753 


E 


1 


3 


7 


4 


2 


1.000 


1.000 


F 


1 


4 


2 


3 


1 


0.706 


0.376 


G 


1 


4 


5 


5 


3 


1.000 


0.954 


H 


1 


5 


2 


1 


6 


1.000 


0.548 


I 


1 


6 


2 


7 


1 


1.000 


0.381 


J 


1 


7 


1 


2 


5 


1.000 


0.294 



importance grades by (43.1) and (43.3) are shown in Table 43.2. Interval im- 
portance grades reflect inconsisntecy in the given interval comparison matrix. 
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Table 43.2. Comparison matrix and importance grades of output items 





yi 


J/2 


J/3 


S/4 


centers 


importance grades 


J/l 


1 


[1/6, 1/3] 


[3,7] 


]l/6,l/2] 


0.135 


[0.071,0.200] 


J/2 


[3,6] 


1 


[6,8] 


[2,4] 


0.522 


[0.391,0.652] 


J/3 


[1/7, 1/3] 


[1/8, 1/6] 


1 


]l/3,l/9] 


0.049 


[0.029,0.070] 


J/4 


[2,6] 


[l/4,l/2] 


[3,9] 


1 


0.294 


JO.163,0.424] 



Within the given interval importance grades DMUs are evaluated and 
their efficiencies obtained by the proposed model (43.6) with (43.5) are shown 
in Table 43.1. The efficiency through the proposed model can be obtained 
from the optimistic viewpoint within a decision maker’s acceptable impor- 
tance grades. Therefore, the efficiencies in the proposed model are smaller 
than those in conventional DEA. 



43.5 Concluding Remarks 

In this paper, we dealt with an interval comparison matrix that contains a 
decision maker’s uncertain judgements and obtained the interval importance 
grade of each item through interval AHP. Then, using DEA, we chose the 
most optimistic weights for DMUq within the interval importance grades 
obtained by a decision maker. A decision maker’s evaluation and a DMU’s 
opinion are taken into consideration by interval AHP and DEA respectively. 
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44.1 Introduction 

The importance of multi-agents systems, models of agents’ interaction is increas- 
ing nowadays as distributed systems of computers started to play a significant role 
in society. An interaction occurs when two or more agents, which have to act in 
order to attain their objectives, are brought into a dynamic relationship. This rela- 
tionship is the conseqnence of the limited resources which are available to them in 
a sitnation. If the number of resources is insufficient to attain agents’ goals it often 
comes into the conflicts. This can happen in almost all industrial activities requir- 
ing distributed approach, such as network control, the design and mannfacture of 
industrial products or the distributed regulation of autonomons robots. However, 
distribnted systems is only one from many different areas where a conflict can 
arise and where it is worth to apply computer aided conflict analysis, lust to men- 
tion some human activities like bnsiness, government, political or military opera- 
tions, labour-management negotiations etc. etc. 

In the paper, we explain the nature of conflict and we define the conflict sitna- 
tion model in a way to encapsnlate the conflict components in a clear manner. We 
propose some methods to solve the most fundamental problems related to con- 
flicts. 



Pawlak Model 

The model introduced in this paper is an enhancement of the model proposed by 
Pawlak in papers e.g. [44.6, 44.8]. In the Pawlak model, some issnes are chosen, 
and the agents are asked to specify their views: are they favourable, neutral or 
against. Thns the analysis are naturally restricted to outermost conclusions like 
finding the most conflicting attributes or the coalitions of agents if more than two 
take part in the conflict [44.8]. In the real world, views on the issues to vote are 
consequences of the decision taken, based on the local issues, the current state and 
some backgronnd knowledge using some strategy. Therefore, the Pawlak model is 
enhanced here by adding to the model some local aspects of conflicts. 
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44.2 Conflict Model 

The information about the local states of an agent ag can be presented in the 
form of an information table, creating the agent ag’s information system 
A ), where a: U —>V for any aeA and V is the value set of attribute a. We as- 

ag-'^ ag a J ag a 

sume: V'., = Any local state se is explicitly described by its informa- 

tion vector Inf^ (s), where Inf^ (s)={(a, a(s)): aeA^J. The set Inf^ (s): se is 
denoted by INF and it is called the information vector set of ag. We assume that 

^ag 

sets { are pairwise disjoint, i.e., A^rA.^^.=0 for ag-^ag’. This condition empha- 
sizes that any agent is describing the situation in its own way. Relationships 
among attributes of different agents will be defined by constraints as shown in 
section 0. 



Local Set of Goals (Similarity of States) 

Every agent evaluates the local states. The subjective evaluation corresponds to an 
order (or partial order) of the states of the agent information table. We assume 
that the function e^^ called the target function, assigns an evaluation score to each 
state; let for example e^^:17^^^[0,l]. The states with score 1 are mostly preferred 
by the agent as target states, while the states with score 0 are not acceptable. 
Maximal elements (determined by an partial order) can be interpreted as those, 
which are targets of the agent, i.e., the agent wants to reach them e.g. in a negotia- 
tion process. 

More precisely the agent ag’s set of goals (targets) denoted by T(ag) is defined 
as the set of target states of ag, which means Tag={se Uaf- eag(s)>jLag}, and jj^^g is 
the acceptance level, chosen by the agent ag - it is subjective which evaluation 
level is acceptable by the agent. 

The state evaluation can also help us to find the state similarity [44.4]. For any 
e>0 and seU^^, we define e-neighbourhood of s by: e(s)=|s'eU^ : le^ (s) - 

e^g(s')l<e). The family {Tag. e(s)}seUag defines a tolerance relation Tag. fs) in 

UagXUag by STag. ej’iff s’G Tag. fs). 



Local Conflict 

The agent ag is in the e-local conflict in a state s iff s does not belong to the 
^-neighbourhood of s’, for any s’ from the set of ag-targets where £ is a given 
threshold. Local conflicts for an agent ag arise from the low level of subjective 
evaluation of the current state by ag. It can be expressed differently that the state s 

does not belong to the ^-environs of the set of goals Tag i.e.: 5 ^ U T^g , 



where Tag.fsj={s”\ s”Tag.es’]. 
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Situation 

Let us consider a set Ag consisting of n agents ag^,...,ag^. A situation of Ag is any 

n 

element of the Cartesian product S(Ag) = INF * (ag^ ) , where INF*(ag) is the 

i=l 

set of all possible information vectors of agent ag^ defined by: 
INF* (ag) = {/ : ^ |Jl/ (ag) :f(a)^ (ag) for a e A^^ } ■ The situation 

S corresponding to a global state S =(Sj, sje is defined by 

(Ha (Ai),-Jnf^ (l))- 

agl agn 



Constraints 

Constraints are described by some dependencies among local states of agents. 
Without any dependencies, any agent could take the state freely and there is no 
conflict at all. Dependencies come from the bound on the number of resources 
(any kind of a resource may be considered, e.g. water on Golan Hills see [44.8] or 
an international position [44.5], everything that is essential for agents). Con- 
straining relations are introduced to express which local states of agents can coex- 
ist in the (global) situation. More precisely, constraints are used to define a subset 
S(Ag) of global situations. Constraints restrict the set of possible situations to ad- 
missible situations satisfying constraints. 



Situations Evaluation 

Usually agents tend to attain the best states without taking care about the global 
good. However, the negotiators experience shows that the real, stable consensus 
can only be found when the global good is considered. Thus the objective evalua- 
tion of situations is introduced - an expert (an arbiter) judgement. For example, 
the United Nation Organisation can be thought as an expert in the military con- 
flicts. 

We assume there is a function 1], called the quality function, which 

assigns a score to each situation (this score is assume to be given by an expert). 
The set of situations satisfying a given level of quality t is defined by: 
(f) = (^ e 5'(Ag) : ^(5) > f] 



System with Constraints 

The multi-agent system, with local states for each agent defined and the global 
situations satisfying constraints, will be called the system with constraints. We de- 
note our system with constraints by 
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44.3 Analysis 

The introduced above conflict model gives us possibility, first to understand and, 
then, to analyse different kinds of conflicts. Particularly, the most fundamental 
problem can be widely investigated, that is, the possibility to achieve the consen- 
sus. Because of the lack of space only the consensus problem on local preferences 
is described in this paper. We propose Boolean reasoning [44.1] and Rough Set 
methodology [44.7] for all analysis. The main idea of Boolean reasoning is to en- 
code the optimisation problem, by corresponding Boolean function f„ in such a 
way that any prime implicant of f„ states a solution of n. The elementary Boolean 
formula is usually obtained here by transforming the information table into the de- 
cision table, generating rules (minimal with respect of number of attributes on left 
side) and determining the description of decision class 44.9. From the elementary 
formulas the final formula describing the problem is shaped. 

Unfortunately calculating prime implicants of such formulas is usually a hard- 
computational problem [44.4]. Therefore depending on the formula, some simple 
strategies or eventually quite complex heuristics must be used to resolve the 
problem in real time. 



Consensus Problem on Local and Global Level 

In this point a conflict analysis is proposed where local information tables and the 
set of local goals are taken into consideration. 

INPUT 

The system with constraints defined in Section 0. 

t - an acceptable threshold of the objective global conflict for Ag. 

OUTPUT 

All situations with the objective evaluation reduced to degree at most t, and with- 
out local conflict for any agent, (it is required that any new situation is constructed 
in the way that all local states in this situation are favourable for the agents). 
ALGORITHM 

The algorithm is based on verification of global situations from Score^p) with the 
local set of goals of agents and constraints. The problem is described by the for- 
mula/: f = ^ fc ^ ftp^ where / describes the set of goals of the agent ag, 

agsAg g “g 

and describes Score and the constraints. The formula /. a representing 
all admissible situations without the global conflict regarding the threshold t. 



44.4 Conclusions 



We have presented and discussed the extension of the Pawlak conflict model. The 
understanding of the underlying local states as well as constraints in the given 
situation is the basis for any analysis of our world. The local goals and the evalua- 
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tion of the global situation are observed as factors defining the strength of the con- 
flict and can suggest the way to reach the consensus. 

The fundamental consensus problem has been analysed in the paper. Then, 
Boolean reasoning and rough set theory has been successfully applied for solving 
presented problem. The lack of space not allowed the authors to present any con- 
flict example - see [44.2] for some exemplar conflict analysis within proposed 
model. 
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Rough Set theory and Granular Computing (GrC) have a great impact on 
the study of intelligent information systems. This paper investigates the fea- 
sibility of applying Rough Set theory and Granular Computing (GrC) to deal 
with imperfect data in Inductive Logic Programming (ILP). We propose a 
hybrid approach, RS-ILP, to deal with some kinds of imperfect data which 
occur in real-world applications. 



45.1 Introduction 

Inductive Logic Programming (ILP, see [45.2, 45.7, 45.8]) can be regarded as 
a new method in machine learning with the advantages of more expressive 
power and ease of using background knowledge. If databases are involved, ILP 
is also relevant to Knowledge Discovery and Data Mining (KDD, see [45.1, 
45.3]). In a simplified form, the normal problem setting of ILP is as follows: 

Given: 

— The target predicate p. 

— The positive examples E~^ and the negative examples E~ (two sets of 
ground atoms of p) . 

— Background knowledge B (a finite set of definite clauses). 

To find: 

— Hypothesis (the defining clauses of p) which is correct with respect to 

and E~ , i.e. 

1. H U B is complete with respect to E~^ (that is: For all e € E~^, HUB 
implies e ). We also say that HUB eovers all positive examples. 

2. H U B is consistent with respect to E~ (that is: For no e € E~ , HUB 
implies e ). We also say that HUB rejects any negative examples. 

To make the ILP problem meaningful, we assume the following prior condi- 
tions: 

1. R is not complete with respect to E~^ (Otherwise there will be no learning 
task at all, because the background knowledge itself is the solution). 

2. R U R+ is consistent with respect to E~ (Otherwise there will be no 
solution to the learning task). 
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In the above normal problem setting for ILP, everything is assumed cor- 
rect and perfect. But in large, real-world empirical learning, data are not 
always perfect. In contrary, uncertainty, incompleteness, vagueness, impreci- 
seness, etc. are frequently observed in the input to ILP - the training exam- 
ples and/or background knowledge. Imperfect input, in addition to improper 
bias setting, will induce imperfect hypotheses. Thus ILP has to deal with 
imperfect data. In this aspect, the theory, measurement, techniques and ex- 
periences are much less mature for ILP than in the traditional attribute- value 
learning methods (compare [45.12], for example). 

We observe that many problems concerning imperfect input or too strong 
bias in ILP have a common feature. In these situations, while it is impossible 
to differentiate distinct objects, we may consider granules - sets of objects 
drawn together by similarity, indistinguishability, or functionality. The emer- 
ging theory of Granular Computing (GrC) (see [45.15, 45.16, 45.14]) grasps 
the essential concept - granules, and makes use of them in general problem 
solving. 

In this paper we concentrate on a particular form of GrC, Pawlak’s Rough 
Set theory [45.9, 45.10, 45.11], investigating its potentials in dealing with 
imperfect data of ILP. The main idea is that, when we use granules instead 
of individual objects, we are actually relaxing the strict requirements in the 
standard normal problem setting for ILP. In the following sections, we will 
discuss some kinds of imperfect data in ILP and propose a hybrid approach, 
RS-ILP, as a solution using Rough Set theory, to deal with such imperfect 
data. 



45.2 Imperfect Data in ILP 

We discuss here two kinds of imperfect data encountered in ILP as examples. 

— Incomplete background knowledge. 

Background knowledge B is essential in ILP learning, and the ease of using 
background knowledge is one of the major advantages of ILP over traditio- 
nal attribute- value learning methods. However, if B lacks essential predica- 
tes (or essential clauses of some predicates), it is possible that no non-trivial 
hypothesis H can be induced. (Note that itself can be always regarded 
as a hypothesis, but it is trivial). In some cases, even a large amount of 
positive examples are given, some examples are not generalized by hypo- 
theses if some background knowledge is missing. This has been a big topic 
in the research area of ILP. 

— Missing classification. 

This means that some examples have unknown classification values (i.e., 
we do not know if an example belongs to or E~). Here we have a set 
of classified training instances E^ U E~ and a set of unclassified instances 
E'^ . If the classified set is small and the unclassified set E^ is ignored, we 
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are facing with the problem of too sparse data to induce reliable hypothesis 
H. But here we have got a set of additional examples E- though we don’t 
know their classification. The challenge is how to utilize our knowledge 
about E' to induce more reliable hypotheses. One approach is to combine 
learning and conceptual clustering techniques (see [45.2]): a conceptual 
clustering algorithm is applied to the set of all known examples, climbing 
the hierarchy tree, using the classified examples to identify class descriptions 
forming El. 

We have proposed several rough problem settings of ILP (RS-ILP for 
short) to deal with such imperfect data. The key idea is to relax the require- 
ment in the normal problem setting that H should be “correct with respect 
to E~^ and E~'\ so that rough but useful hypotheses can be induced. Some 
of them will be discussed in the following sections. 



45.3 RS-ILP for Missing Classification 

If E+ U E~ is a small set, we cannot expect that the induced hypothesis H 
will have high prediction accuracy. Sometimes we may have an additional set 
of examples E- that are unclassified (that is, we do not know whether these 
examples belong to E~^ or E~). Can we utilize E- to increase the prediction 
accuracy? We propose the following rough problem setting for this purpose: 

Given: 

— The target predicate p (the set of all ground atoms of p is U). 

— An equivalence relation R on U (we have the approximation space A = 
(U,R)). 

— A set of positive examples A+ C U and A set of negative examples E C [/. 

— A set of unclassified examples E- CU. 

— Background knowledge B. 

Considering the following rough sets: 

1. E+- = E+U {e- G E-\3^^E+eRe-}; 

2. E~- =E~U {e- G E-\3^^E~eRe-}. 

To find: 

— Hypothesis H' (the defining clauses of p) which is correct with respect to 
E~^- and E~^ . That is, 

1. H- LI B covers all examples of A+’; 

2. H- U B rejects all examples of E~- . 

In such rough problem setting, we use equivalence relation R to “enlarge” 
the training set (by distributing some examples from E'^ to E~^ and E~). 
Different R will produce different hypothesis H' . It is reasonable to expect 




45. Dealing with Imperfect Data by RS-ILP 357 



that the more unclassified examples are added to E~^, the more general hy- 
pothesis will be induced; the more unclassified examples are added to E~ , 
the more specific hypothesis will be induced. 



45.4 RS-ILP for Too Strong Bias 

Declarative bias (restrictions on the hypothesis space and/or on the search 
strategies) is necessary in any inductive learning (so in ILP). Clearly, there 
is a trade-off between the tractability of search, which is improved by a small 
search space, and the availibility of a correct hypothesis, which is improved 
by a large search space. Particularly, if the bias is too strong, we may miss 
some useful solutions or have no solution at all. Most ILP systems provide 
mechanisms for the user to specify bias, and allow the user to change bias 
(weakening the restrictions when the current ILP session fails) . This strategy 
is called bias shift. 

Here we investigate this problem from another point of view. Supposing 
that the training set E^ U E~ and the background knowledge B are perfect, 
but if we restrict the hypotheses to non-recursive clauses (a bias often imposed 
in some ILP systems), we still could not find any meaningful hypothesis in 
the normal problem setting of ILP. However, relaxing the requirement in the 
normal problem setting of ILP that El should be “correct with respect to 
and E~'\ in order to find a “rough” solution that is within the language 
defined by the bias. 



45.5 Concluding Remarks 

This paper addressed the problem of imperfect data handling in Inductive 
Logic Programming (ILP) using some ideas, concepts and methods of Rough 
Set theory and GrC. We presented a hybrid approach, RS-ILP, to deal with 
some kinds of imperfect data which occur in large real-world applications. 
Although some part of this work is still in the initial shape, we believe that 
the general ideas presented here may give rise to more concrete results in 
future research. Future work in this direction includes finding more concrete 
formalisms and methods to deal with other kinds of imperfect data, and gi- 
ving quantitative measures associated with hypotheses induced of RS-ILP. 
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The paper realizes a step in developing a foundation for approximate rea- 
soning from experimental data to conclusions in natural language. Granule 
decomposition strategies based on background knowledge are outlined. 



46.1 Introduction 

Information granulation belongs to a collection of intensively studied topics 
in soft computing (see, e.g., [46.19], [46.20], [46.21]). One of the recently emer- 
ging approaches to deal with information granulation is based on information 
granule calculi (see, e.g., [46.10], [46.12], [46.15], [46.13])developed on the ba- 
sis of the rough set [46.6] and rough mereological approaches (see, e.g., [46.9], 
[46.10], [46.12]). The development of such calculi is important for making pro- 
gress in many areas like object identification by autonomous systems (see, 
e.g., [46.1], [46.18]), web mining (see, e.g., [46.4]), approximate reasoning ba- 
sed on information granules (see, e.g., [46.15], [46.7]) or spatial reasoning (see, 
e.g., [46.2], [46.8]). In particular, reasoning methods using background kno- 
wledge as well as knowledge extracted from experimental data (e.g., sensor 
measurements) represented by concept approximations [46.1] are important 
for making progress in such areas. 

Schemes of approximate reasoning (AR-schemes, for short) are derived 
from parameterized productions [46.11], [46.13]. The productions, specifying 
properties of operations on information granules, are assumed to be extracted 
from experimental data and background knowledge. The problem of AR- 
schemes deriving is closely related to perception (see, e.g., [46.21]). In the 
paper we outline some methods for decompostion of information granules. 



46.2 Granule Decomposition 

In this section, we discuss briefly a granule decomposition problem. This is 
one of the basic problems in synthesis of approximate schemes of reasoning 
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from experimental data. We restrict our considerations to the case of infor- 
mation granule decomposition supported by background knowledge. Some 
other decomposition methods are presented in [ 46 . 9 ], [ 46 . 5 ]. 

Assume that a knowledge base consists of a fact expressing that if two 
objects belong to concepts C\ and C2, then the object constructed out of 
them by means of a given operation / belongs to the concept C provided that 
the two objects satisfy some constraints. However, we can only approximate 
these concepts on the basis of available data. Using a (generalized) rough set 
approach [ 46 . 14 ] one can assume that an inclusion measure Vp for p G [ 0 , 1 ] is 
given making it possible to estimate the degree of inclusion of data patterns 
Pat, Pati, and Pat2 from languages L, L\, and L2 in the concepts C, C\, 
and C2 , respectively. Patterns included to a satisfactory degree p in a concept 
are classified as belonging to its lower approximation while those included to 
a degree less than a preset threshold q < p are classified as belonging to its 
complement. Information granule decomposition supported by background 
knowledge is accomplished by searching for patterns Pat of high quality (e.g., 
supported by a large number of objects) and included in a satisfactory degree 
in the target concept C. These patterns are obtained by performing a given 
operation / on some input patterns Pati and Pat2 (from languages Li and 
L2, respectively) sufficiently included in Ci and C2, respectively. 

One can develop a searching method for such patterns Pat based on 
tuning of inclusion degrees p\, p2 of input patterns Pati, Pat2 in Ci, C2, 
respectively, to obtain patterns Pat (constructed from Pati , Pati by means 
of a given operation /) included in C in a satisfactory degree p and of accep- 
table quality (e.g., supported by the number of objects larger than a given 
threshold) . 

Assume degrees pi,P2 are given. There are two basic steps of searching 
procedures for relevant pairs of patterns {Pati, Pat2) ■ (i) searching in lan- 
guages Li and L2 for sets of patterns included in degree at least pi and p2 in 
concepts Ci and C2, respectively, (ii) selecting from sets of patterns generated 
in step (i) satisfactory pairs of patterns. 

We would like to add some general remarks on the above steps. 

One can see that our method is based on a decomposition of degree p 
into degrees pi and p2 under some constraints. In Step 2 , we search for a 
relevant constraint relation R between patterns. By Sem{Pat) we denote the 
meaning of Pat in, e.g., a given information system. The goal is to extract 
the following approximate rule of reasoning: 
if 

R{Sem{Pati), Sem{Pat2)) A Vp^ {Sem{Pati), Ci) A Vp^ {Sem{Pat2), C2) 

then 

Vp{f{Sem{Pati) x Sem{Pat2)),C) f\Qualityt{f{Sem{Pati) x Sem{Pat2))) 
where p is a given inclusion degree, t - a, threshold of pattern quality 
measure Quality t, /- operation on objects (patterns), Pat- target pattern, 
C, Cl, C2-given concepts, R,pi,p2 are expected to be extracted from data 
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and {Pati, Pat2) is satisfying R (in the case we consider R is represented by 
a finite set of pattern pairs) . 

One can also consider soft constraint relations Rr where r G [ 0 , 1 ] is a 
degree of truth to which the constraint relation holds. 

Two sets Pi,P2 are returned as the result of the first step. They consist 
of pairs {pattern, degree) where pattern is included in C\, C2, respectively in 
degree at least degree. 

These two sets are used to learn the relevant relation R. We outline two 
methods. 

The first method is based on an experimental decision table {U,A,d) 
[ 46 . 6 ] where 17 is a set of pairs of discovered patterns in the first step; A = 
{degi,deg2} consists of two attributes such that degi{{Pati, Pat2)) is equal 
to the degree to which Pati is at least included in Ci for i = 1,2; the decision 
d has value p to which the granule composed by means of operation / from 
{Pati, Pat2) is at least included in C. From this decision table the decision 
rules of a special form are induced: if degi > pi A deg2 > P2 then d > p 
where {pi,P2) is a minimal degree pair such that if p{ > pi and p'2 > P2 then 
the decision rule obtained from the above rule by replacing pi , p'2 instead of 
Pi,P2, respectively, is also true in the considered decision table. 

A version of such a method has been proposed in [ 46 . 9 ]. The relation 
R consists of the set of all pairs {Pati,Pat2) of patterns with components 
included in Ci,C2, respectively in degrees p'l > pi, p'2 > P2 where pi,P2 
appear on the left hand side of some of the generated decision rules. 

The second method is based on another experimental decision table 
{U,A,d) where objects are triplets {x,y, f{x,y)) composed out of objects 
X, y and the result of / on arguments x, y; attributes from A describe fea- 
tures of arguments of objects and the decision d is equal to the degree to 
which the elementary granule corresponding to the description of f{x,y) by 
means of attributes is at least included in C. This table is extended by ad- 
ding new features being characteristic functions apati of patterns Pati dis- 
covered in the first step. Next the attributes from A are deleted and from 
the resulting decision table the decision rules of a special form are induced: 
if apati = 1 A apat2 = 1 then d > p where if Pat\,Pat2 are included in 
C\,C2, in degree at least pi,P2, respectively and Pat'^, Pat'2 are included in 
Cl, C2 in degree p'l > pi and p'2 > P2, respectively then a decision rule obtai- 
ned from the above rule by replacing Pat'^ , Pat'2 instead of Pati , Pat2 is also 
true in the considered decision system. The decision rules describe constraints 
specifying the constraint relation R. Certainly, in searching procedures one 
should also consider constraints for the pattern quality. 

The searching methods discussed in this section return local granule de- 
composition schemes. These local schemes can be composed using techniques 
discussed in [ 46 . 10 ]. The received schemes of granule construction (which can 
be also treated as approximate reasoning schemes) have also the following 
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stability (robustness) property: if the input granules are sufficiently close to 
input concepts then the output granule is sufficiently included in the target 
concept provided this property is preserved locally [46.10]. 



Conclusions 

We have discussed methods for decompostion of information granules as a 
way to extract from data productions used to derive Ai?-schemes. Searching 
for relevant patterns for information granule decomposition can be based on 
methods for tuning parameters of rough set approximations of fuzzy cuts or 
concepts defined by differences between cuts [46.13], [46.16], i.e., by using 
so called rough- fuzzy granules. In this case, pattern languages consist of pa- 
rameterized expressions describing the rough set approximations of parts of 
fuzzy concepts being fuzzy cuts or differences between cuts. Hence, an inte- 
resting research direction related to the development of new hybrid rough- 
fuzzy methods arises aiming at developing algorithmic methods for rough set 
approximations of such parts of fuzzy sets relevant for information granule 
decomposition. 

In our further study we plan to implement the proposed strategies and 
test them on mentioned above real life data. This will recquire: (i) to develop 
ontologies for considered applications, (ii) further development of methods 
for extracting productions from data on the basis of decomposition, and (iii) 
synthesis methdos for Hi?-schemes from productions. These methods will 
make it possible to reason by means of sensor measurements along inference 
schemes over ontologies (i.e., inference schemes over some standards) by me- 
ans of attached to them Hi?-schemes discovered from backround knowledge 
(including ontologies) and experimental data. 
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Approximate Bayesian networks are applied to construction of the new case 
classification schemes. Main topics of their extraction from empirical data 
are discussed. 



47.1 Introduction 

A Bayesian network (BN) is a directed acyclic graph (DAG) designed to 
represent knowledge about probabilistic conditional independence statements 
between features ([47.4]). One can model data by extraction of approximate 
BNs with possibly low number of edges, but still approximately preserving 
information entropy of data (cf. [47.9]). This idea agrees with a common 
principle of tending to possibly short descriptions of models, what is assumed 
to provide the best knowledge generalization abilities ([47.2, 47.5, 47.6, 47.7]). 

We show how methodology based on approximate Bayesian networks can 
be applied to the new case classification problem. We introduce the Bayesian- 
like decision model, which classifies new cases along the structure of BN with 
decision attribute as a root. 



47.2 Frequencies in Data 

It is assumed that data can be represented as an information system A = 
([/, A), where each attribute a G A is identified with function a : U ^ 14, for 
14 denoting the set of values on a. Let us write A = (oi, . . . , a„) according to 
some ordering over the set of attributes. For any B C A, function B : U ^ Vg 
labels objects u G U with vectors B{u) = {ai^{u ), . . . , ai^{u)), where values 
of successive attributes G B, j = 1, . . . , m. The set Vg = {B{u) : u G U} 
gathers all vectors of values on B, which occur in A. 

Reasoning about data can be stated, e.g., as the classification problem 
concerning a distinguished decision to be predicted under information pro- 
vided over the rest of attributes. For this purpose, one represents data as 
a decision table A = ({7, A U {d}), d ^ A. To express conditions^decision 
dependencies, one can use frequencies of occurrence of Vd G 14 conditioned 
by G Vq , of the form 
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PA{vd/wB) 



|{m G U : B{u) = wb a d{u) = Vd}! 
\{u&U ■. B{u) = ws}| 



(47.1) 



Then, for a given a G [0, 1], a-inexact decision rule B = wb =ka d = Vd is 
satisfied, iff PA{vd/wB) > oi, i.e., iff for at least a ■ 100% of objects u G U 
such that B{u) = wb we have also d{u) = Vd- The strength of the rule is 
provided by quantity Pa{wb) = |{uG U : B{u) = wb} \ / \ U\. It corresponds 
to the chance that an object u G U will be recognized, i.e., it will satisfy the 
left side of the rule. 

Frequencies were introduced to rough sets as rough membership functions 
([47.3]). The rough set principle of reduction of possibly large amount of 
redundant information takes in their context the following form: 

Definition 47.2.1. Given A = {U,AU {d}), we say that B C A preserves 
frequency of d iff for each u G U we have PA.{d{u) / B{u)) = PA.{d{u)/A{u)). 
If additionally, there is no proper subset of B satisfying such a condition, 
then B is called a frequency decision reduct. 

Several alternative definitions of a frequency-based reduct were proposed wit- 
hin the rough set framework (cf. [47.5, 47.6]). One can mention about the 
following aspects of adaptation of frequencies to the rough set methodology: 

Remark 4^7.2.!. If we treat Pa as the empirical probability for the product 
space over the set of random variables A U {d}, then preserving frequency 
of d by P means that d is independent on A \ B conditioned by B. So, 
each frequency decision reduct is actually a Markov boundary of d within A 
([47.4]). 

Remark 47.2.2. Frequency distribution provides the basis for expressing in- 
exact dependencies in various ways. For instance, the set approximations or 
generalized decision functions developed directly within rough sets ([47.2]) 
can be derived from Pa (cf. [47.8]). 



47.3 Approximate Independence 

Condition for preserving frequency turns out to be too rigorous with respect 
to possible noises or fiuctuations in real life data. This is the general problem 
of dealing with probabilistic conditional independence (PCI) while analyzing 
empirical data. In [47.9] the information entropy-based approximation of PCI 
was proposed. 

Definition 47.3.1. Let A = (U,A) and X,Y G A he given. Entropy of X 
conditioned by Y is defined by 

Hj^{X/Y) = - ^ E ^og^PA{X{u)/Y{u)) 

' ' uGU 



(47.2) 




366 D. Sl^zak 



Definition 47.3.2. For e G [0, 1), A = ([/, A), X,Y,Z C A, we say that X 
is £- approximately independent on Z conditioned by Y (we will denote such 
a fact by 1%(XIYIZ)), iff 

Ha{X/Y) + log2(l -s)< Ha{X/Y U Z) (47.3) 

If A takes the form of A = (U,AU {c?}), we say that B C A e -approximately 
preserves frequency of d, iff I ^{d/ B / A\ B) holds. If additionally, there is 
no proper subset of B satisfying such a condition, then B is called an e- 
approximate frequency decision reduct. 

Proposition 47.3.1. The notions of a frequency decision reduct and a 0- 
approximate frequency decision reduct are equivalent. 

According to Remark 47.2.1, e-approximate frequency decision reducts can 
be treated as e-approximate Markov boundaries of d. By tuning s G [0,1), 
we can search for smaller boundaries ’’e-almost” preserving entropy-based 
information about decision. 

Theorem 47.3.1. Let e G [0,1) be given. The problem of finding minimal 
e-approximate frequency decision reduct is NP-hard. 

One can deal with the above problem by adaptation of techniques developed 
in [47.5, 47.6], devoted to searching for decision reducts of various types. 



47.4 Bayesian Classification 

One of the aims of searching for approximate reducts is to improve the abi- 
lity of classification of new cases. Any B C A corresponds to the bunch of 
possibly inexact rules B = B{u) ^pj^(d{u)/B{u)) d = d{u) indexed by suc- 
cessive objects u G U. If B is an £-approximate frequency decision reduct, 
then elements of the above bunch imply particular decision classes in a way 
” e-close” to decision rules based on the whole A. If B is substantially smaller 
than A, then the rules generated by B are shorter and stronger. Thus, they 
usually recognize new cases more effectively. 

The classification process can also correspond to the rules with decision 
situated at their left sides. This is the case for the Bayesian methods (cf. [47.1, 
47.4, 47.7]). A new case with values equal to those of some object u G U can 
be classified as, e.g., having decision value v = argmax^^g v,iPA{A{u)/vd),i.e. 
the value on d for which the observed vector on A occurs the most frequently 
in A. To improve the ability of the new case recognition, one can set up an 
ordering A= (oi, . . . , a„) and note that 

n 

Pa{A{u) / d{u)) = J^PA(ai('u)/ii(M),ai(u), . . . ,a*_i(u)) 

2=1 



(47.4) 
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Proposition 47.4.1. Let A = (C/, A U {c?}) with ordering A = (oi, . . . , a„) 
he given. Assume that for each i = 1, . . . ,n, a frequency decision reduct Bi 
for table = ([/, {d, Oi, . . . , a^-i} U {ai}) is provided. For each u G U , we 
have 

arg max PA{A{u)/vd) = arg max TT PA{ai{u)/vd, Bi \ {c?}(m)) 

Vd^Vd Vd^Vd . 

t: d£Bi 

(47.5) 

According to the above equality, there is no need to consider probabilities 
corresponding to subsets Bi not including d, since they are independent on 
the choice of decision value. We thus obtain a new formula for the new case 
classification, which is comparable to the previous one over vectors occurring 
in data. In case of combinations not included in , it remains to trust into 
the generalization abilities of the classification scheme based on the right 
side of (47.5). Obviously, these abilities could be still improved by conside- 
ring subsets Bi as approximate decision reducts. Then, however, one must 
remember that outcomes of classification based on the right side of (47.5) 
would be just ’’e-close” to those obtained by application of the left one. 



47.5 Approximate Bayesian Networks 

The ordered frequency models turn out to be closely related to the notion of 
a Bayesian network ([47.4]) - a tool for the graphical representation of kno- 
wledge about probabilistic independence statements, by using the structure 
of a directed acyclic graph (DAG). In its approximate form, the notion of a 
Bayesian network can be defined as follows: 

Definition 47.5.1. For given e and A = {U, A), DAG D = {A, 1^) is called 
an s-approximate Bayesian network (an s-BN, in short), iff 

yx.Y.ZQA [{X/Y/Z)d ^ lUX/Y/Z)] (47.6) 

where by {XfYjZ)i> we mean that X is d-separated from Z by Y , i.e, that any 
path between any elements of X and Z comes through (1) a serial or diverging 
connection covered by some node in Y , or (2) a converging connection not in 
Y , with no directed path towards any node in Y . 

Theorem 47.5.1. Given e, A = (U,A) and DAG D = {A, 1^), let us define 
the entropy of D by 

Ha(^) = Y. HA{a/{h GA:{b,a)G )^}) (47.7) 

a&A 



If inequality F[ a{T^ ) + ^og 2 {l — e) < Ha{A) holds, then D is an e-BN for A. 
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One can consider Bayesian nets for decision tables as well. Actually, the 
construction of the product at the right side of (47.5), based on reducts 
calculated along a given ordering, corresponds to the structure of BN over 
Au{d}, with the root in d. Theorem 47.5.1, applied to decision tables, results 
with a conclusion that similar classification schemes may be worth considering 
also as based on approximate reducts. 

Definition 47.5.2. Let e and A = ([/, A U {d}), A= (m, . . . , a„), be given. 
We say that B = {B \, . . . , B„) is an e- approximate ordered frequeney model 
for A, iff there are thresholds £i, . . . ,e„ G [0, 1) satisfying inequality (1 — £i) • 

• • • • (1 — £„) > 1 — e, such that Bi is an £i~ approximate frequency decision 
reduct for Aj = ([/, {d, oi, . . . , Oi_i} U { 0 ^}), for each i = 1, . . . , n. 

Proposition 47.5.1. Let e and A = {U, Au{d}) be given. Any e-approximate 
ordered frequency model B = {Bi , . . . , Bn) induces the e-BN D = (Au{d}, 1?) 
defined by putting = Ur=i{(^’ af) \b G Bi\. 



47.6 Conclusions 

We presented the Bayesian-like classification model based on approximate 
frequency decision reducts, extracted from training data with respect to an 
ordering over conditional attributes. It turned out to have much in com- 
mon with modeling data with approximate Bayesian networks introduced in 
[47.9]. We believe that presented methodology will provide new possibilities 
of application of Bayesian networks to the real life data analysis. 
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This paper presents a formal approach to identifying partial adaptability 
of software components. First we discuss the partial adaptability of compo- 
nents with the same arity (or interface) as a requirement. Then we extend 
the approach to the components with the different arities. Rough Set Theory 
(RST)-like method is used to identify algebraic equivalency between the com- 
ponents and the requirements, on which the adaptability is based. 



48.1 Introduction 

Component based development imposes several new difficulties on us, in spite 
of many advantages. One of the critical difficulties is that there are no ap- 
propriate ways to identify adaptable components, since there are no compre- 
hensive measures to evaluate the adaptation of software components. 

While the most previous component based approaches focused on full 
adaptability of the components, this paper discusses partial adaptability and 
component collaboration. 



48.2 Defining Adaptation of Software Components 

There are many aspects in the requirements to software, however most essen- 
tial and imperative one is functional adaptation, which implies each software 
component performs desired data transformation [48.5]. Since requirements 
and software components deal with many types of data in order to define 
functionality of them, the above transformation rules are expressed in the 
form of S-sorted functions which compose (many-sorted) E algebra [48.1]. 

E algebra provides an interpretation for the signature E = (S', 17), where 
S is a set of sorts, and 17 is an S* x S sorted set of operation names. S* is 
the set of finite sequences of elements of S. A Y algebra is an algebra (A, F), 
where 

1. A = {A^\a e S} (a set of carriers) and 

2. F= {/aI/ e /a:A^,x---xA^„^ A^. 
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S-sorted function fA is said to have arity cti . . . (j„ and result sort a. 
Operational equivalency between two S algebras A and B is evaluated 
by S homomorphism , which is defined as a family of functions rj = {r]a-\<J G 
S, r]^ : ^ Ba-} such that 

V/ G Vfli G A^^ [ri„[fA{ai, . . . , a„)) = /b ( 77 ^ 1 ( 01 ), . . ,,ri„„{an))] 

where A={A,F) and B—{B,G) are B algebras. Ja and fs are the functions 
in F and G respectively, or elements of A and B if n = 0. Each requirement 
compose a B algebra including only one function, and so does each compo- 
nent. If the domain of definition in such function is finite, or countable, each 
function can be represented in the form of a decision table [48.4]. 

Assuming a requirement and a component 

f a\...(Tn,<T ■ Afj^ X • • • X 7 Afj 

9^1. ..an, (7 ■ ^ (— ^ai X • • • X Bfj^) 7 Bfj 

are given, those functions / and g are represented by decision tables as shown 
in Table 48.2. In those tables ith row or fcth row of the table f or g means 
/(Mil, • • ■ , u^n) = Vi or g{xki, ■■■ , Xkn) = Vk respectively. 



Table 48.1. A decision table of the requirement / and the component g 



The table of / The table of g 

[/<’•) A„,...A„,...A^n A^ B„,...B^,...B„n 



1 


Mil . . 


. Mlj . . 


• • Uln 


Ml 


1 


Xll . . 


. Xlj . . 


■ Xln 


yi 


i 


Mil . . 


. Uij . . 


• Uin 


Vi 


k 


Xkl ■ . 


■ Xkj ■ ■ 


• ^kn 


Vk 



48.3 Identifying One-to-One Component Adaptation 

When a requirement is expressed in the form of S-sorted function with the 
arity cti . . . <t„ and the result sort a, the adaptable components to it must 
have the same arity and result sort, if the adaptation is evaluated by B 
homomorphism. The carriers and B^^ can be regarded as attributes which 
can classify and by them [48.3]. 

In order to identify B homomorphism from the requirement to the com- 
ponent, we examine all the possible sets of mappings 

{Vaj ■ -^aj ^ ^aj} (j = Oj 1) ■ • • ) Vl) 

and there are the permutations qBpj kinds of mappings as rja-j, if they are 
injections. 
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If {rjaj} is S homomorphism, the formula 

'T't ^ ^ (jlcri (^il) 5 ■ • • 5 V(7n (^m)) V(7 (^il ; ■ ■ ■ 7 ^m)) ] (48. 1) 

holds. This formula is equivalent to the following formula 

\/i G 3k G [xkj = VcTj (uij) and yu = r]aiv^)] (48.2) 

when / and g are expressed in the forms of Table 48.2, since they include all 
the possible data transformation by / and g respectively. The formula (48.2) 
implies that each row in Table / is mapped into a single row in Table g by 
S homomorphism {r/aj}- When the above E homomorphism does not exist, 
we have to reduce the requirement / in the following way in order to define 
E homomorphism for partial adaptation. 

1. Let £ = {{?7CTj I 3 = 0,1,... ,n}} be a set of all the possible sets of 
mapping 

{Vaj : (j = 0, 1, . . . , n) where = Pa- 

2. For each {rjcrj | j = 0, 1, . . . , n}, classify for the requirement / into 
Ua\{v<jJ) and C/^''^({?7 <tJ), where 

= {i\3k e [rlcrj{u^j) = Xkj, and rjcr{vi) = yk]} and 
= U(A - C/M 

3. Select the set of mapping {rj*. | j = 0, 1, . . . ,n} G £ 

^ (r) 

which makes the cardinality of maximum, that is, 

G £ [card{U^j\{r]*.})) > card{U^j\{r]^^}))] holds. 

Ba \{Vo-j}) represents the maximum adaptation of the component g to the 
requirement /. We denote u‘^\{rjaj}) = U^\g). By extracting the rows 
belonging to U^\g) from Table /, we can define the new function f* C /. 

The E algebra B = {B, G) is evidently E homomorphic to the E 
algebra = (A*,F*), where A* = , . . . , A*}, F* = {/*}, 

A’^. = {xij I i G and A* = {j/i | z G U^^}. The function f* is re- 

ferred to as the restriction of / into A* . 

After examining all the functions in Q, that is, the set of S-sorted functions 
with the same arity and result sort as the requirement /, if 

/ = U /*, or equivalently = |J U^a\9) 
g&Q g&G 

holds, the requirement / is satisfiable by the set of the functions Q = {g}, 
using the set of E homomorphism £* = \ (j = 0, 1, . . . ,n)}}. When 

implementing the requirement / by the above set of components Q = {g}, 
we need the knowledge on which part of the function / (or the domain of 
definition of /) is satisfied by each component g. Obviously the less compo- 
nents require the less knowledge, and it is desirable from practical viewpoint. 
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Identifying the minimum set of components is a kind of the set cover problem 
with identical costs, and a near optimal solution can be found by the greedy 
method [48.2]. This method is expressed in the following way in our situation. 

1. Select gi &Q which make card[U^\gi)) maximum. Denote this U^\gi) 
as U^. 

2. Select g 2 & Q (52 ^ gi) which make card{U^^\g 2 )C\U^ maximum, where 
U means the complementary set of U. Denote this U^^\g2) as C/|. 

3. Repeat the above procedure. Each time we identify gt G G which make 

i-l 

card{UX{gi)) n (H C/^)) maximum. 

fc=i 

4. Terminate the iteration if we identify g-n G G which satisfies 
[/* U • • • U C/i* = 

G* = {gi, ■ ■ ■ , (/„} is the minimum set of components that satisfies the requi- 
rement /. 



48.4 Identifying One-to-Many Component Adaptation 

Even though a requirement can not be satisfied by a set of components by 
the above way, there could be a possibility to satisfy it by collaboration of 
several components. 

Assuming there is a pair of components {g\, ( 72 ) for a requirement /, which 
satisfy 



fcri...<T^,a ■ ID) - 


(D C X • • • X 


(48.3) 




— > Rp (El C X • • • X 


(48.4) 


92a-m + l---<7n ,cr ■ ^ 


[E 2 — > -Ba (E 2 C B^^^^ X-- - X B„J 


(48.5) 


3m' (m < m' < 


n) [Ba^, = Bp] 


(48.6) 



Since the order of the sorts in the requirement /, that is the arity of /, is 
arbitrary and does not affect the data transformation rule of /, we can reorder 
the arity of / in order to satisfy the above conditions. 

When gi and 52 are represented in the form of decision tables, we can 
connect g\ and (72 in order to compose the new function with the same arity 
and result sort as / in the following way, supposing / is represented in the 
form of Table 48.2. 

1. Let 77cr„- and r]a be mappings 

il<Tj : Act, — > (1 > j > n) and 77 ^, : 

2. Connect two rows (x^i, ■ ■ ■ ,Xkm) and {xk\m+i} ■ ■ ■ ,Xk'n), which satisfy 

Xkj = Vcjjiutj) (j = 1, • ■ • ,m), Xk'j' = {f = m+ I,.. ■ ,n), and 

Uj — ^k'm' 

3. Define the new decision table which is composed of the rows 

■ ■ ■ 7 ^kmi ^k' ,m+l5 ■ ■ • 7 ^k'rii ^k') 
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There are multiple {ija-j} as discussed in the previous section, therefore 
there could be multiple decision tables composed through the above way. We 
denote the table with maximum number of rows as g*, which represents a 
new S-sorted function. 

We obtain a subset of / corresponding to g*, which is derived from Table 
/ by extracting the rows indexed by i selected in the step 2 in the above 
procedure. We denote this subset as /*. The g* is S homomorphic to the f* 
as we discussed in Section 48.3. 

By examining all the possible pair of the above gi and g2, we can identify 
the set of S homomorphic function pairs {{f* , g*)}- If 

U f* = f 

(91,92) 

holds, the requirement / can be satisfied by the set of S-sorted functions 
{{91^92)}- We can identify the minimum set of the above pair {{91,92)} in 
the similar way to Section 48.3. This approach can be extend to the S-sorted 
function tuple {gi, . . . ,gi) similarly. 



48.5 Conclusions 

A formal approach to identifying adaptable software components to requi- 
rements is proposed in this paper. The adaptation is evaluated by S homo- 
morphism between the requirements and the components. Unlike the previous 
approaches, we define partial adaptation based on decision tables which re- 
present the requirements and the components. We also defined two forms of 
adaptation, that is, one-to-one adaptation and one-to-many adaptation. 
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This paper introduces a measure defined in the context of rough sets. Rough 
set theory provides a variety of set functions that can be studied relative 
to various measure spaces. In particular, the rough membership function is 
considered. The particular rough membership function given in this paper 
is a non-negative set function that is additive. It is an example of a rough 
measure. The idea of a rough integral is revisited in the context of the discrete 
Choquet integral that is defined relative to a rough measure. This rough 
integral computes a form of ordered, weighted ’’average” of the values of a 
measurable function. Rough integrals are useful in culling from a collection of 
active sensors those sensors with the greatest relevance in a problem-solving 
effort such as classification of a ’’perceived” phenomenon in the environment 
of an agent. 



49.1 Introduction 

This paper introduces a measure defined in the context of rough sets [49.3]. In 
this paper, we investigate measures defined on a family p{X) of all subsets of 
a finite set X, i.e. on the powerset of X . A fundamental paradigm in rough set 
theory is set approximation. Hence, there is interest in discovering a family 
of measures useful in set approximation. By way of practical application, an 
approach to fusion of homogeneous sensors deemed relevant in a classification 
effort is considered (see, e.g., [49.6]). Application of rough integrals has also 
been considered recently relative to sensor signal classification by intelligent 
agents [49.8] and by web agents [49.9]. This research also has significance in 
the context of granular computing [49.10]. 

This paper is organized as follows. Section 49.2 presents a brief introduc- 
tion to classical additive set functions. Basic concepts of rough set theory are 
presented in Section 49.3. The discrete Choquet integral is defined relative 
to a rough measure in Section 49.4. A brief introduction to sensor relevance 
is given in Section 49.5. 
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49.2 Classical Additive Set Functions 

This section gives a brief introduction to one form of additive set functions 
in measure theory. Let card{X) denote the cardinality of a finite set X (i.e., 
the number of elements of set X). 

Definition 49.2.1. Let X be a finite, non-empty set. A function A : p{X) 

3? where 3? is the set of all real numbers is called a set function on X. 

Definition 49.2.2. Let A be a finite, non-empty set and let A be a set 
function on X. The function A is said to be additive on X iff A(AUi3) = 
A(A) -I- \{B) for every A,Bg p{X) such that An B = ^ (i.e., A and B are 
disjoint subsets of A). 

Definition 49.2.3. Let A be a finite, non-empty set and let A be a set 
function on A. A function A is called to be non-negative on A iff X(Y) > 0 
for any Y S p(A). 

Definition 49.2.4. Let A be a set and let A be a set function on A. A 
function A is called to be monotonia on A iff A C implies that A(A) < X{B) 
for every A,B G p(A). 

A brief introduction to the basic concepts in rough set theory (including 
the introduction of an additive rough measure) is briefly given in this section. 



49.3 Basic Concepts of Rough Sets 

Rough set theory offers a systematic approach to set approximation [49.2]. To 
begin, let S' = (C/, A) be an information system where U is a, non-empty, finite 
set of objects and A is a non-empty, finite set of attributes, where a : U ^ Va 
for every a G A. For each B C A, there is associated an equivalence relation 
IndA{B) such that 



IndA{B) = {{x,x') G C/^ I Va e R.a(x) = a(x')} 



If {x,x') G IndA{B), we say that objects x and x' are indiscernible from 
each other relative to attributes from B. The notation [x]b denotes equiva- 
lence classes of IndA{B). 

Definition 49.3.1. Let S = {U, A) be an information system, B C A, u G U 
and let [u]b be an equivalence class of an object u GU oi IndA{B). The set 
function 



■ p{U)^ [0,1], where (A) 



card (A n [u]g) 
card ([m]^) 



(49.1) 



for any A G p{U) is called a rough membership function (rmf). 
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The form of rough membership function in Def. 49.3.1 is slightly different 
from the classical definition where the argument of the rough membership 
function is an object x and the set X is fixed [49.3]. 

Definition 49.3.2. Let u G U. A non-negative and additive set function 
Pu ■ p{X) [0,oo) defined by Pu(Y) = p'(Y n [u]b) for Y G p{X), where 
p' : p(X) —>■ [0,oo) is called a rough measure relative to U / Ind,A{B) and u 
on the indiscernibility space {X,p{X),U/IndA{B)). 

The rough membership function : p{X) [0, 1] is a non-negative set 
function [49.4]. 

Proposition 49.3.1. (Pawlak et al. [49.4]) The rough membership function 
as defined in Definition 49.3.1 ( formula (49.1)) is additive on U. 
Proposition 49.3.2. {X,p{X),U/IndA{B),{p^}uGu) is a rough measure 
space over X and B. 

Other rough measures based on upper {lower} approximations are pos- 
sible but consideration of these other measures is outside the scope of this 
paper. 



49.4 Rough Integrals 

Rough integrals of discrete functions were introduced in [49.5]. In this section, 
we consider a variation of the Lebesgue integral, the discrete Choquet integral 
defined relative to a rough measure. In what follows, let X = {x \, . . . , be 
a finite, non-empty set with n elements. The elements of X are indexed from 
1 to n. The notation denotes the set {x(i),X(i+ip . . . where i > 1 

and n = card{X). The subscript {i) is called a permutation index because 
the indices on elements of are chosen after a reordering of the elements 
of X. This reordering is ’’induced” by an external mechanism. 

Definition 49.4.1. Let p be a rough measure on X where the elements of 
X are denoted by xi, . . . , x„. The discrete Choquet integral of / : X ^ 
with respect to the rough measure p is defined by 

« n 

I f dp=^{f{x(i)) - /(a;(i_i)))p(X(q) 

i—1 

where specifies that indices have been permuted so that 0 < /(a;(q) < 
••• < f{x(n)), := |x(i),...,a;(„)}, and f{x^o)) =0. 

This definition of the Choquet integral is based on a formulation in Gra- 
bisch [49.1], and applied in [49.2], [49.7]. The rough measure p(X(q) value 
serves as a ’’weight” of a coalition (or combination) of objects in set 
relative to /(a;(j)). It should be observed that in general the Choquet integral 
has the effect of ’’averaging” the values of a measurable function. This aver- 
aging closely resembles the well-known Ordered Weighted Average (OWA) 
operator [49.11]. 
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Proposition 49.4.1. Let 0 < s < r. If a(x) € [s,r] for all x G Xa, then 
f G (0;^] where uGU. 



49.5 Relevance of a Sensor 



In this section, we briefly consider the measurement of the reievance of a 
sensor using a rough integral. A sensor is considered relevant in a classification 
effort in the case where / a for a sensor a is close enough to some threshold 
in a target interval of sensor values. Assume that a denotes a sensor that 
responds to stimuli with measurements that govern the actions of an agent. 
Let {a} = B C A where a : U ^ [0, 0.5] where each sample sensor value a{x) 
is rounded to two decimal places. Let {Y,U — Y) be a partition deflned by an 
expert and let [rtje denote a set in this partition containing u for a selected u G 
U. We further assume the elements of [u]e are selected relative to an interval 
{u — e,u + e) for a selected e > 0. We assume a decision system {Xa, a, e) is 
given for any considered sensor a such that Xa Q U, a : Xa 3?^ and e is an 
expert decision restricted to Xa deflning a partition {Y(^Xa, {U — Y)r\Xa) of 
Xa- Moreover, we assume that n [wje yf 0. The set [u]e is used to classify 
sensors and is given the name ” classifier” . Let u denote the average value in 
the classifier [u]e, and let S G [0, 1]. Then, for example, the selection R of the 
most relevant sensors in a set of sensors is found using 



R = 



G B : 



a, fil - a{u) 



< S 



In effect, the integral / serves as a Alter inasmuch as it ’’Alters” 

out all sensors with integral values not close enough to a{u). 



49.6 Conclusion 

Rough set theory provides a variety of set functions that can be studied rela- 
tive to various measure spaces. In particular, the rough membership function 
is considered. The particular rough membership function given in this paper 
is a non-negative set function which is additive and, hence, is an example of 
a rough measure. We are interested in identifying those sensors considered 
relevant in a problem-solving effort. The rough integral introduced in this 
paper serves as a means of distinguishing relevant and non-relevant sensors 
in a classification effort. 
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In ’’real world” databases, attribute domains are more than Cantor sets; the 
additional semantics defined, in this paper, is assumed to be carried by a 
binary relation. Association rules in such databases are investigated. In this 
paper, we show that the cost of checking the additional semantics is rather 
small. Some experiments are reported. 



50.1 Introduction 

In relation theory, all attribute domains are assumed to be Cantor sets. Ho- 
wever, in practice, they are ’’real world sets,” that is, there are interact among 
themselves. The question is: Can such interactions be modeled mathemati- 
cally? In first order logic, the real world is modeled by a Cantor set with 
relational structure. We follow this approach; as a first step we consider sim- 
plest case, that is, the relational structure is defined by a binary relation. 
Such ’’real world sets;” have been called binary neighborhood system spaces, 
or BNS-spaces [50.4], [50.3]. 

This paper report the study of association rules in such semantically rich 
relations. 



50.2 Relational Models and Rough Granular Structures 

A relation is a knowledge representation that maps each entity to a tuple of 
attribute values. Table 50.1 illustrates the knowledge representation of the 
universe V = {vi, W 2 , ws, ^' 4 , fs}- 

In this view, an attribute can be regarded as a projection that maps 
entities to attribute values, for example in Table 50.1, the CITY attribute is 
the map, 

f :V — > Dom{CITY), 

which assigns, at every tuple, the element in the first column to the element 
in the last column. The family of complete inverse image forms a 
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partition (equivalence relation). So each column (attribute) defines an equi- 
valence relation. So Table 50.1 gives rise to 4 named equivalence relations. 
Pawlak called the pair V and a finite family of equivalence relations a know- 
ledge base. Since knowledge bases often have different meaning, we will call 
it rough granular structure, or rough structure, which is a special form of 
binary granular structure [50.3]. 



Table 50.1. Information Table of Suppliers; arrows and parentheses will be sup- 
presed 



V 




(S# 


SNAME 


Status 


City) 


Vl 




(Si 


Smith 


TWENTY 


Cl) 


V2 




(S2 


Jones 


TEN 


C2) 


V3 




(S 3 


Blake 


TEN 


C2) 


V4 




(S 4 


Clark 


TWENTY 


Cl) 


V5 




~JST~ 


Adams 


THIRTY 


C3) 



50.3 Databases with Additional Semantics 

Relational theory assumes everything is a Cantor set. In other words, the in- 
teractions among real world objects (entities or attribute values respectively) 
are ’’forgotten” in the relational modeling. In practical database processing, 
additional semantics in attribute domain are often processed. For example, in 
numerical attributes, the order of numbers is often used in SQL statements 
by human operators. Therefore these additional semantics implicitly exist in 
the stored database. To capture such additional semantics in data mining, 
we need a data model which is semantically richer than relational. 

What would be the ’’correct” mathematical structure of real world ob- 
jects? We will follow the first order logic; the universe and attribute domains 
are assumed to have relational structures. As a first step, we will confine our- 
selves to the simplest kind of relational structure, namely, binary relations. 

In Table 50.2, we give an example of a binary relation defining ”near”- 
semantics on CITY. Note that a binary relation B define a binary neigh- 
borhood Bp = {x \ X B p\ ai every p-called a binary granular structure. A 
relation with such additional semantics defines a binary granular structure 
on the universe V , that is the universe is equipped with a finite family of 
named binary relations. 
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Table 50.2. ”Near”-Binary Relation 



CITY 


CITY 


Cl 


Cl 


Cl 


C2 


C2 


C2 


C2 


Cl 


C2 


Cz 


Cs 


Cz 


Cz 


C2 



Table 50.3. Binary Granular Structure; Relation with additional semantics 



the 

center 


Elementary 
Granule 
encoded label 


Attribute 

value 

meaningful name 


* 


S#(*) 


* 


* 


SNAME{*) 


* 


vi,va 


STATUS(IOOIO) 


TWENTY 


V 2 ,Vz 


STATUS(OllOO) 


TEN 


V5 


STATUS(OOOOl) 


THIRTY 


Vl,V4 


CITY [llllQ) 


Cl 


V 2 ,Vz 


c/ry(iiiii) 
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50.4 Mining Real World or Its Representations 

What is data mining? The common answer is essentially ”to find the pattern 
in data.” This is not entirely accurate; we would like to amend the notion as 
follows: The goal of data mining is to find patterns in Real World, represented 
by the given data. For convenience, we will denote the real world by RW and 
the data (knowledge representation) by KR. 

For example, we will not be interested in a discovered rule, say, 

’’all data are represented by 5 characters.” 

Because this is a pattern of KR, not RW. To show that a discovered pattern 
in a KR is, indeed, a pattern of RW. We need to show that the patterns is IN- 
VARIANT under attribute transformations. In other words, the pattern also 
exits in other knowledge representations. However, we can take the following 
alternate approach: 

Find the patterns in the mathematical structure, RW, of Real World. 

For relational data base, the mathematical structure of RW is the rough gra- 
nular structure (or knowledge base, if we use Pawlak’s terminology); see Ta- 
ble 50.3. If we conduct the data mining in such a structure, it is RW mining; 
no attribute transformations are needed In this paper, we extend this ap- 
proach to ’’real world” databases. 
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50.5 Clustered Association Rules-Mining Semantically 

Machine oriented model uses granules as its attribute values, so any logi- 
cal formula is translated to set theoretical formula of granules. However, we 
should note that attribute values are semantically related, so in processing 
any logical formula based on attribute values, it is important that one checks 
the continuity (namely, see if it respects the semantics). We will call any pat- 
tern or rule that respects the semantics a clustered pattern or clustered rule. 
Let c and d be two attribute values in a relation 

Definition Clustered Association rules 

1. A pair (c, d) in a given relation is one-way (c — > d) continuous (or 
clustered) if every point x in the elementary neighborhood Be there is at 
least one y in Bd such that (x,y) is in the given relation. 

2. A pair (c, d) in a given relation is a two way continuous (or clustered) if 
(c — > d) and (d — > c) are both continuous. 

3. Clustered association rule: A pair (c, d) is an association rule iff the pair 
is an association rule and two way continuous. 

4. Soft association rule: A pair (c, d) is a soft association rule, if Card 
{NEIGH{c) n NEIGH{d)) > threshhold. [50.1], [50.2] 

Here is some of our experimental results: see Table 50.4. 

Data characteristics and meaning of comments: (1) Rows 100000; (2) Co- 
lumns 16; (3) Support 500 items; (4) Main Memory size 10 mega bytes; 
(5)”Generated 56345 2-combinations” means ”56345 candidates of length 2 
are generated”; (6) ”4973 2-large granule” means ”4973 candidates meet the 
support threshhold”; (7) ’’Continuous 4973 2-associaton rules” means ”4973 
continuous association rules of length 2 are checked.” From the table, it is 
clear the cost of checking continuity is small. 



50.6 Conclusion 

The advantage of data mining by granular computing are: 

1. it is fast in mining classical relations, granular computing is faster than 
Apriori [50. 5], [50. 6] because the “database scan” are replaced by bit ope- 
rations. 

2. the use of granular computing is extended to ’’real world” databases 
(semantically richer relations); its cost is small. Moreover, such extra 
semantics can be used to analyze unexpected rules [50.7]. 
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Table 50.4. Granular and neighborhood method: 
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The filtration method in modal logic is considered to develop a way of for- 
mulating an aspect of granular reasoning, which, roughly speaking, means 
human reasoning based on granularity. The method, however, originates in 
purely logical problems like decidability. Then, for our purpose, an extended 
concept of relative filtration is newly introduced using lower and upper ap- 
proximations in rough set theory. An example of reasoning process using the 
relative filtration is illustrated. 



51.1 Introduction 

This paper aims to provide a small step for formulating an aspect of granular 
reasoning. What, however, is granular reasoning? Although, as far as the 
authors know, there seems to have been no consensus of what it means as a 
technical term, it would indicate some mechanism for reasoning using rough 
set theory (Pawlak[51.3]) and granular computing (Lin[51.2]). 

Our point of departure is the filtration method in modal logic (Chellas 
[51.1]). It is a standard way of proving finite determination and decidability. 
The basic idea of filtration method is to generate a kind of quotient model 
from the original one so that its set of possible worlds is finite. Usually, the 
method is performed using a given finite set of sentences to which we pay 
attention with respect to purely logical problems. 

When, however, we deal with problems beyond pure logic, we often find 
ourselves paying attention to proper subsets of such given set of sentences. 
For the purpose, we introduce a concept of relative filtration, an extended 
definition of filtration with approximation in rough set theory. Finally we 
illustrate a formulation of human reasoning where a model is not kept fixed 
but is changed into a new one using the relative filtration whenever required. 



51.2 Preliminaries 

A modal language £ml(’P) is formed from a given countable set of atomic 
sentences V in the usual way with a standard set of logical connectives inclu- 
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ding the two modal operators. For a sentence p in CMiiV), its subsentences 
are defined in the usual recursive way. Let Sub{p) be the set of subsentences 
of p. A set r of sentences is said to be closed under subsentences just in case 

Hpf 

Sub{p) C r for every p in F. Let Vp = V C\ Sub{p). Also, for the set F of 

H p"F 

sentences, let Vp = Up^pVp. 

A Kripke model At is a structure <W,R,V>, where bF is a non-empty set 
of possible worlds and R is an accessibility relation on W , and is a valuation, 
which assigns a subset in W to each atomic sentence p in 7^. Define Ai,w \= p 
iff w e t^(p). It means that a sentence p is true at a possible world w in M. ^ 
is extended for every compound sentence in the usual way. The set of possible 

worlds IIpII^ {w & W \ M,w \= p} is called the proposition of p in At. 

Given a Kripke model M =<W,R,V>, let A be a set of sentences closed 
under subsentences. Two worlds w, w' in W is said to be F -equivalent, written 
w^pw' , when, for every sentence p in F, A4,w ^ p iff A4,w' ^ p. Then, a 
filtration of M through F (or, F -filtration of At, for short) is defined as a 

f 

structure Mp =<Wp,Rp,Vp>, where Wp = Wj^p, Rp is a relation on 
Wp satisfying several conditions (for details, see Chellas[51.1], p.lOl), and, 

for each atomic sentence p in Vp, Vp{p) {[w]~r \ w gV (p)}. With respect 
to the filtration, the following remarkable result is obtained: for every p in F, 
M ^ p iff AJr \= P- Note that, if |A| = n, then \W/^p\ < 2”. 



51.3 Relative Filtration with Approximation 

Given a Kripke model At =<W,R,V>, let A be a set of sentences closed 
under subsentences and Z\ be a non-empty subset of A, which is also assumed 
to be closed under subsentences. In this paper, we call elements in A and 
A®(= £ml(A) \ A), respectively, explicit and implicit sentences. Also, we call 
elements in A and F \ A, respectively, focal and marginal sentences. 

The filtration that we want to formulate in what follows contains the 
set of possible worlds and accessibility relation R^. The difference is 
its valuation (let be V', tentatively). Truth values for every focal atomic 
sentence in Va can be defined in the same way in the usual Z\-filtration 
of AI. In general, however, we cannot determine truth values for marginal 
atomic sentences in Vp \ Va- 

For example, consider the Kripke model M. with W = {wi,W 2 , ■ ■ • ,wq} 
and valuation V given in Fig. 51.3. Let A = {pi,P 2 ,P 3 }(= Vp) and A = 
{pi,P2}(= Va)- Then, by equivalence relation ^a, the following quotient 
set of four new possible worlds (equivalence classes) is generated: Wa = 

[ru6]~^}- The truth values for pi and P 2 in Va can 
be assigned in the same way in the usual Z\-filtration because every element 
in each equivalence class (newly generated possible world) has the same truth 
value of the original valuation. Note that 
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iipir = iiP2r = 

where + denotes the direct sum. Thus, in terms of rough set theory [5 1.3], for 
every sentence p in Z\, |jp||'^ is a ^A~definahle set. For a marginal sentence 
P 3 , since, for all w in w G V^(P 3 ) and, for all w in w ^ V^(P 3 ), 

so, we can assign, respectively, 1 and 0 to pa at two newly generated worlds 
and For a new world on the other hand, the two 

original worlds W 2 , in [tU2]~zi have different states: W 2 G V^(p 3 ) and W 3 ^ 
R(P 3 ). Thus we no longer uniquely determine R'(p 3 ) with respect to [w 2 ]^^- 
We have the same result for [w 4 \^^={wi, ws}. Hence, in general, we can give 
only a partial definition of V . 
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Fig. 51.1. Partiality of valuation when making relative filtration. 



Here note that 

c llpall^ c + [W2U^ + 

by which we have 

:^(llP3r ) = K]..,, ^(llPsir ) = K].., + [W2]^^ + 

in terms of rough set theory. Thus, we can introduce the concept of approxi- 
mation in rough set theory into relative filtration. This means that we have 
two kinds of definition of valuation in the following way. 

Definition 51.3.1. For every explicit atomic sentence p in F, lower and 
upper valuation through A relative to F (or, lower and upper A/ F -valuation, 
for short) are defined by, respectively, 

1 . V^/r{p) = {[wUJ M^.c|ip|r} = {[u;UI M^.c^(llPir)} 

2. Wp) = {M^J N^.nllpir ^0} = {[u;]..J M^.C^dlPir)}- 

Thereby, we have the following two kinds of filtration: 

Definition 51.3.2. 

1. A lower filtration of Ai through A relative to F (or, lower A/ F -filtration 
of M, for short) is defined by 

MA/r=<WA,RA,VA/r >, 
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2. An upper filtration of M. through A relative to F (or, upper A / F -filtration 
of M) is defined by 

Mjjfi 1l:f< Wa, > . 

Lemma 51.3.1. In lower and upper Z\/A-filtrations of A4, for a marginal 
sentence p in set difference F \ A, we have, respectively 

1. Ma/f , \=P^ M,w\=p, 

2 . M,w\=p^ Ma/f, h P- 



51.4 Example of Granular Reasoning 

Let us consider the following reasoning process: 

(pi) Socrates is Human. 

(p2) Human is Mortal. 

(c) Socrates is Mortal. 

First, we formulate a model. Given V = {Human, Mortal, }, consider a 

model Ai =<W, R, V >, where W = {Socrates, Plato, Tweety, Zeus, • • • }, R is 
an arbitrary relation on W, and V is defined by 



V 


Human Mortal 


Socrates 


1 1 


Plato 


1 1 


T weety 


0 1 


Zeus 


0 0 







Let F = {Human, Mortal} and A = {Human}. At the first step. Premise (pi) 
can be translated into 

Al, Socrates ^ Human (51-1) 

in the usual way in rough set theory. At the second step, in order to trans- 
late Premise (p2), we need a lower Human/F-filtration. That is, if we define 
by M,w ^ Human iff M,w' |= Human, then we have 

lb/~Human = { 1 1 H uma n 1 1 "^ , ( 1 1 H u ma n"^ ) °} . 

Then, we can translate Premise (p2) into 

AdHuman/r , HHuman||^ |== Mortal. (51.2) 

At the third step, by Formula (51.1), we have 
||Human|j^ = [Socrates]...„„„,,„ 
so, by Formula (51.2), we have 
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•MHLjman/r , [Socrates].^H„^,„ 1= Mortal. (51.3) 

At the final step, by Lemma 51.3.1 and Formula (51.3), we can conclude 

Af, Socrates 1= Mortal, (51-4) 

which is just the translation of Conclusion (c) . Hence, we can represent our 
example of reasoning by the following four steps: 

(51.1) A4, Socrates (= Human, 

(51.2) AdHuman/r, ||Human||-^ |= Mortal 

(51.3) Adnuman/r , [Socrates].^„„^^„ |= Mortal 

(51.4) A4, Socrates ^ Mortal. 



51.5 Concluding Remarks 

The main characteristic of human reasoning is resource-boundedness. We 
cannot have unlimited ability of reasoning. Thus if we keep to fix our model 
for reasoning, then we must run with a great number of detailed facts. Thus 
we must ignore anything that is unnecessary to the current step of our reaso- 
ning. So what we should explore is a way of disregarding such many irrelevant 
things and our proposal is to adopt a filtration-like method. In fact, from the 
first to the second steps in the previous section, we use relative filtration, 
which plays a part like ’zooming in’ in reasoning process. Then we can dis- 
regard details of our world that have no connection with the step. From the 
second to the third steps, on the other hand, a kind of inverse operation of 
filtration is used as if its effect is ’zooming out.’ Then we can restore some 
of the details in order to have some conclusion about our world. Hence, if 
reasoning mechanism contains such kind of operations like zooming in and 
out, then we can focus our attention into what is essentially needed for each 
step of reasoning process. 
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Association rules in data mining are considered from a point of view of condi- 
tional logic and rough sets. In our previous work, given an association rule in 
some fixed database, its corresponding Kripke model was formulated. Then, 
two difficulties in the formulation were pointed out: limitation of the form of 
association rules and limited formulation of the models themselves. To resolve 
the defects, Chellas’s conditional logic was introduced and thereby, the class 
of conditionals in conditional logic can be naturally regarded as containing 
the original association rules. In this paper, further, an extension of condi- 
tional logic is introduced for dealing with association rules with intermediate 
values of confidence based on the idea of fuzzy-measure-based graded modal 
logic. 



52.1 Introduction 

The recent rapid progress of computer technology enables us to analyze a 
massive number of transaction data in commercial database systems. Such 
direction has provoked various ways of knowledge discovery from large data- 
base and, among them, a mining of the so-called association rules proposed 
by Agrawal et al.[52.1] has obtained the widespread recognition that it is one 
of the most active themes of data mining. 

In our previous paper[52.9], we investigated logical meaning of association 
rules from a point of view of Chellas’s conditional logic[52.4] and Pawlak’s 
rough sets[52.10]. Thereby, we obtained a relationship between Chellas’s con- 
ditional logic and association rules with full confidence. The logic shows the 
difference between material implication and conditional, so our previous re- 
sult enables us to deal with exact inference on association rules as conditionals 
as well as an extension of the form of association rules. 

In this paper, as a next step of our previous work, we present an extension 
of conditional logic based on the idea of fuzzy-measure-based graded modal 
logic (cf. Murai et al.[52.6, 52.7, 52.8]), in which we can represent association 
rules with intermediate degrees of confidence. 
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52.2 Association Rules 

Let X be a finite set of items. Any subset in X is called an itemset in X. 
An itemset can be a (possible) transaction. A database V is defined as a 
set of actual transactions, so X> C 2^. For an itemset A(C X), its degree 

of support s(A) is defined by s(A) = \{T € T> \ X C T}|/|X>|, where | • | is 
a size of a set. Given a set of items X and a database T>, an association 
rule[52.1, 52.2] is an implication of the form X Y, where X and Y 

are itemsets in X with A n F = 0. An association rule r = {X Y) 
holds in V with confidence c{r) (0 < c < 1) iff c(r) = s(A U Y)/s{X). An 
association rule r = (A Y) has a degree of support s{r) (0 < s < 1) 
in T> iff s(r) = s(A U Y). Mining of association rules is actually performed 
by generating all rules that have certain minimum support (denoted minsup) 
and minimum confidence (denoted minconf) that a user specifies. Consult, 
e.g., [52.1, 52.2, 52.3] for details of such algorithms for finding association 
rules. 



52.3 Previous Works 

We describe association rules in a Kripke model. Given a set of items X and 
a database V, we construct a modal language £ml(X) in the usual way[52.4], 
where we regard any item as an atomic sentence. 

Definition 52.3.1 ([52.9]). For a given association rule r = {x y), its 
corresponding finite Kripke model Adi. is defined as a structure <Wx>, Rx,V >, 
where (1) LFp = V, (2) for any X, T' in Wv, TRxT' iff Tn A = T'n A, so Rx 
is an equivalence relation on W, and (3) for any item x in X, Vr{x,T) = true 
iff a; G T. 

Because Rx is an equivalence relation, the modal operators defined in this 
model Xir satisfy axiom schemata of KT5{= S5). Note that the model de- 
scribed here depends on the premise x of a given association rule. 
Definition 52.3.2 ([52.9]). For an association rule r = (A Y), let 

A = {xi, • • • ,Xm} and Y = {yi, ■ ■ ■ ,y„}. Then, two sentences px and py 
are defined by 

def A A A 

Px = Xi A • • • A Xm = AxiGX 

def A A A 

Py = J/i A • • • A ?/„ = 2/*- 

Then, we have the next theorem: 

Theorem 52.3.1 ([52.9]). For an association rule r = (A Y) and its 
corresponding model 

c(A ^ r) = 1 iff Adi [= Px ^ Py, 

where, in general, Xi \= p means that p is true at any world in Xi. 
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Now we find the following two problems: 

1 . Limited form of association rules whose antecedent and consequent both 
can take only the form of conjunction. 

2. Limited formulation of the models that depends on the fixed antecedent 
of a given association rule. 

To resolve these defects, in [52.9], we introduced Chellas’s conditional logic[52.4j. 

Given a set of items X and a database V, we construct a language Xq^(X) 
for conditional logic [52.4], where the difference of formation rules is 

ifp,p' G >Ccl( 21) then (p>p') G >Ccl(21), 

where > expresses ’conditional.’ 

Definition 52.3.3 ([52.9]). For a given database I?, its corresponding finite 
conditional model M.d is defined as a structure <Wd, fv,Vr)>, where (1) 

def 

Wti = X>, (2) for any world (transaction) T in W-d and any set of itemsets 
X, fv{T, X) X, and (3) for any item x in X, Vo{x, T) = true iff a; G T. 
Then we have the following theorem: 

Theorem 52.3.2 ([52.9]). Given a database T> and its corresponding con- 
ditional model A4x), for arbitrary association rule r = (X Y), 

c(r) = 1 iff Mv h Px >Py, 

0 < c(r) < 1 iff Mv h ^{{Px >Py) V (px>^Py)), 
c{r) = 0 iff Mv H Px > ^Py- 

The theorem provides us the rigid correspondence between association rules 
with full and no confidence and a subclass of conditionals in conditional 
logic. Thus, in the framework of conditional logic, we can regard the set of 
conditionals as an extension of association rules. 



52.4 Graded Conditional Logic 

In this section, we introduce a graded minimal conditional logic in order to 
make direct treatment of intermediate degrees of confidence of association 
rules. For minimal conditional models, see Ghellas[52.4]. Given a set of items 
X, a language Cgc\_{X) for graded conditional logic is formed from X in the 
usual way, where the difference is 

if p,p' G XgCL(X) and 0 < k < I then {p >k p') G XgCL(X), 

where i>fc is graded conditional for 0 < fc < 1. 

Let us formulate a finite graded conditional model Mgv for a given da- 
tabase V. 
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Definition 52.4.1. A finite graded minimal conditional model Mgv is a 
structure <Wv,{9k}o<k<i,Vv>, where 



gk{w,X) =' G I > k}. 



The basic idea of this definition is the same as in fuzzy-measure-based models 
for graded modal logic (cf. [52.6, 52.7, 52.8]). After all, in this kind of models, 
the truth condition of graded conditional is defined by 



V (p >k <h w) = true iff 
iff 



iibr^-i - • 



Because we use probability for the definition of the function gk, we have the 
soundness results based on fuzzy-measure-based semantics (cf. [52.6, 52.8]) 
shown in the following table. 



Table 52.1. Soundness results of graded conditionals. 
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The conditional probability adopted in Definition 52.4.1 is nothing but the 
degree of confidence when it is applied to an association rule r = (X Y). 
In fact, by easy calculation, we have 

M- llbxll^«-n|bF||^^"l 

llbxir«-| 

so, for any transaction T in Wt>, 
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V (px >c(r) Py,T) = true. 

Thus we can conclude that an association rule r = (X Y) can be trans- 
lated into a graded conditional {px >k Py) where k = c(r). 



52.5 Concluding Remarks 

This paper, as one of a series of our papers on theoretical consideration 
between association rules and conditional logic, introduced a first step for 
graded conditionals that correspond to association rules with intermediate 
degrees of confidence. 
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53.1 Introduction 

A large number of individuals with disabilities engage in problem behaviors 
which are influenced by environmental and social factors [16]. A smaller, but 
significant, proportion of problem behaviors appear to maintained by physiologi- 
cal events [4, 11, 16]. Over time, problem behavior maintained primarily by 
physiological events may be influenced by environmental factors [3]. For these 
reasons, a more sophisticated assessment approach that considers the interrelation 
between physiological and environmental factors is needed [9, 16]. 

The purpose of this descriptive study was to use data mining system LERS to 
explore the relation between arousal level, environmental events, and the occur- 
rence of problem behavior. The adult subject who participated in this study was 
diagnosed with severe mental retardation, had visual impairments, and engaged in 
serious problem behavior. The data from this study were collected by using an 
athlete’s heart rate monitor to gather heart rate every 15 seconds. A fixed time pe- 
riod was identified through the functional assessment where problem behaviors 
were likely, activities were highly predictable and did not vary, and the subject 
engaged in roughly the same type of motor movements. A camcorder was turned 
on at the same time as the heart rate monitor and the researchers coded the tapes 
later in real time using a software program to record behavioral and environmental 
events [12, 15]. Two independent observers recorded 30% of all baseline sessions 
and 34% of all remaining sessions for this subject in order to establish interob- 
server agreement. 

Originally, sequential analysis procedures were used to assess conditional 
probabilities of heart rate change and problem behavior given predictor conditions 
[1, 7]. This previous study confirmed that the Polar Heart rate monitor identified 
unique patterns of heart rate in relation to problem behavior [6] . 

The problem behaviors identified in this study included self-injury in the form 
of self-biting and aggressive behaviors, specifically, striking out or attempting to 
bite others. Disruptive behaviors in the form of throwing objects, pushing away 
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from a task, or using two hands to raise his chair forcefully upwards while in a 
sitting position were also assessed. Finally, the functional assessment included 
self- stimulatory behavior that involved pressing on the carotid artery ("neck 
press"). Results of the original study indicated that self-biting, aggression towards 
others, and disruptive behaviors were maintained by escape from non-preferred 
tasks. Demand statements and the presence of specific activities were identified 
as antecedent variables that occasioned problem behavior. Evidence gathered 
during an initial assessment suggested that the act of pressing on the carotid artery 
triggered internal sensations that were of a reinforcing quality to the subject and 
tended to occur in the absence of environmental stimuli. The subject’s heart rate 
increased in the 15-second interval following the occurrence of self-biting and ag- 
gressive behaviors [6]. 

In our study 26 sessions were recorded, in this paper we present rules induced 
from a transformed data set coming from only one such session. 

A modified version of this paper was presented at the Japanese Society for Arti- 
ficial Intelligence International Workshop on Rough Set Theory and Granular 
Computing, RSTGC-2001 [5]. 



53.2 Data Mining 

The original, raw data set describing self-injurious behavior were divided into two 
parts. In the first part the subject’s heart rate was recorded every 15 seconds. The 
second data set consisted values of three variables: a combined variable repre- 
senting the subject’s behavior and external stimulus (different codes were used to 
avoid ambiguity) and two variables representing the moments of time (in seconds, 
counting from the beginning of the session). The first of these two variables indi- 
cated when a specific event started, while the second variable indicated when the 
event finished. 

The main problem was that the original, raw data were temporal (time- 
depending). Secondly, the original, raw data sets were not synchronized with each 
other, i.e., they presented situations at different moments of time. The original, 
raw data sets were eventually transformed into one data, called in the sequel 
transformed data set, processed later by data mining system LERS. 

The first step of data preparation was discretization of the numerical variable 
Heart Rate. The next step of was adding new variables. Behavior- 15, Behavior-30, 
External_Stimulus-15, and External_Stimulus-30, describing values of the vari- 
ables Behavior and External_Stimulus 15 seconds earlier and 30 seconds earlier, 
respectively. 

In this study, the data mining system LERS (Learning from Examples based on 
Rough Sets) [8] was used to analyze the relation between the subject’s behavior, 
external stimuli, and heart rate. Rules were induced by algorithm LEM2 [8], a 
part of the system LERS. 

As the first step of LERS processing of the input data file, LERS checks if the 
input data file is consistent (i.e., if the file does not contain conflicting cases). Our 
data set was inconsistent, therefore, LERS computed lower and upper approxima- 
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tions of all concepts. The ideas of lower and upper approximations are funda- 
mental for rough set theory [13, 14]. Rules induced from the lower approximation 
of the concept certainly describe the concept, so they are called certain. On the 
other hand, rules induced from the upper approximation of the concept describe 
the concept only possibly (or plausibly), so they are called possible [8]. 

Rules induced from raw, training data are used for classification of unseen, 
testing data. The classification system of LERS is a modification of the bucket 
brigade algorithm [2, 10]. The decision to which concept a case belongs is made 
on the basis of three factors: strength, specificity, and support. They are defined 
as follows: Strength is the total number of cases correctly classified by the rule 
during training. Specificity is the total number of attribute-value pairs on the left- 
hand side of the rule. The matching rules with a larger number of attribute-value 
pairs are considered more specific. The third factor, support, is defined as the sum 
of scores of all matching rules from the concept. The concept C for which the 
support (i.e., the sum of all products of strength and specificity, for all rules 
matching the case), is the largest is a winner and the case is classified as being a 
member of C 



53.3 Results 

Normal heart rate both in the current interval and in the preceding 30-second in- 
terval strongly predicted the absence of problem behavior. The absence of be- 
havior was likely when high heart rate and staff praise occurred in the current in- 
terval, with very high heart rate in the preceding 15-second interval. Demands 
and high heart rate 30 seconds prior to an identified interval predicted the absence 
of problem behavior as well. There was a higher likelihood of disruptive behavior 
when the subject’s heart rate in the current interval, and previous 15, and 30- 
second interval was very high and an activity was in occurring 15-seconds before 
the target interval. Disruptive behavior also was predicted when the activity was 
occurring in the current interval, and there was a very high heart rate in the cur- 
rent, previous 15-second, and 30-second intervals. A disruptive behavior was 
likely in the presence of very high heart rates 30 seconds, 15 seconds prior to the 
targeted interval, very high heart rate in the current interval, and an activity in the 
previous 30 seconds. An uncertain rule similar to this pattern predicted self- 
biting. In this case, a self-bite was predicted when heart rate in the current interval 
was very high and the 15 and 30 seconds before the target interval with an activity 
occurring 30 seconds prior to the behavior. Self-bites were predicted when there 
was a high heart rate and staff praise occurring 30 seconds before, with high heart 
rate 15 seconds prior to the behavior and in the current interval. Neck press was 
predicted when both demand and praise statements had occurred in the previous 
30 second interval while heart rate was very high and heart rate remained high in 
the current interval. 
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53.4 Conclusions 

Intriguing patterns arose from the data providing information about antecedent 
events in rich detail. Normal heart rate was highly predictive of the absence of 
problem behavior. High and very high heart rates were more likely to be included 
in rule statements predicting self-biting, disruptive, and aggressive behavior. Rule 
statements suggested that activities were more likely to be present in the current 
interval which provides support for the hypothesis that the subject’s behavior was 
maintained by escape from nonpreferred tasks. Heart rate data collected every 15 
seconds created some difficulties interpreting the data. A number of independent 
and dependent variables would occur within the same 15-second interval, making 
it difficult to identify whether an external stimulus was actually an antecedent 
event. 

More sensitive heart rate equipment must be used in order to truly explore the 
relation between heart rate and problem behavior as well as the utility of the data 
mining LERS system. In addition, a richer set of codes exploring environmental 
events that influence behavior would increase the consistency of the data and pro- 
vide a more comprehensive understanding of the variables influencing problem 
behavior. 

Further research is needed to compare the results of the session analyzed in this 
article with the remaining 25 sessions for this subject. Another option would be to 
combine the 26 sessions in order to conduct a comprehensive analysis. 

This study represents an important first step in the analysis of heart rate, be- 
havior, and environmental events. Despite the challenges encountered In collect- 
ing and analyzing heart rate data in real life environments, new methodological 
approaches are greatly needed to fully understand the link between physiology and 
behavior. The data mining LERS holds great potential utility for clinical support 
of individuals with disabilities who engage in problem behavior. 
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This paper describes a new clustering method based on rough set theory. 
This method classifies objects according to the indiscernibility relations de- 
fined on the basis of relative similarity. First, an initial equivalence relation, 
which evaluates local similarity of objects, is assigned to every object. Then 
modification of the initial equivalence relations is performed by examining 
global relationships among them. An initial equivalence relation will be mo- 
dified if it gives excessively fine classification to the objects. Consequently, 
generation of small category is suppressed and adequately coarse clusters are 
formed. Experimental results on the artificial data showed that this method 
produced good clustering results for both of nominal and numerical data. 



54.1 Introduction 

Advances in communication systems and high performance computers enable 
us to easily collect valuable information from the Internet and to construct 
databases that store huge amount of information. Clustering has been re- 
ceiving considerable attention as one of the most promising approaches for 
revealing underlying structure in such databases. However, the well-known 
clustering algorithms such as K-means [54.1] and Fuzzy C-Means (FCM) 
[54.2] have difficulty in handling nominal data since they require calculation 
of the cluster centers. Although the agglomerative hierarchical clustering me- 
thod [54.3] enables handling of nominal data by the use of relative similarity, 
it still has a problem that the clustering result strongly depends on the order 
of handling objects. 

This paper introduces an order-independent clustering method for no- 
minal and numerical data based on rough sets [54.4]. Objects are classified 
according to the indiscernibility relations defined on the basis of relative si- 
milarity. In the first step, we form initial equivalence relations among objects 
based on their relative similarity, and classify them into some categories ac- 
cording to the relations. Similarity threshold is independently determined 
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to each object. In the second step, we modify similar equivalence relations 
into one type of relation so that it represents more simple knowledge which 
generates adequate number of categories. The optimal clustering result can 
be obtained by evaluating the cluster validity, defined using upper and lower 
approximations of a cluster, throughout all clusters generated with various 
threshold values for modification. In the experiments we demonstrate that 
this method produces good clustering results for both nominal and numeri- 
cal data. 



54.2 Clustering Method 

The method consists of two steps: (1) Assignment of initial equivalence relati- 
ons and (2) Modification of initial equivalence relations. Step (2) is repeated 
using various values of modification threshold Th, and the best clustering 
result which yields maximum validity is obtained. 



54.2.1 Initial Equivalence Relation 

In the first step, an initial equivalence relation is assigned to each object. 
Defined on the basis of relative similarity, an equivalence relation splits the 
entire set into two equivalence classes: one containing similar objects and 
another containing dissimilar objects. Namely, each relation performs classi- 
fication by evaluating local similarity between its corresponding object and 
other objects. 

[Definition 1] Initial Equivalence Relation 

Let U = {xi,X 2 , Xn} be the set of objects we are interested in, and assume 
that each object has p attributes represented by nominal or numerical values. 
An equivalence relation Ri for object Xi is defined by 

R, = {{Pi},{U-Pi}}, 

where 

Pi = {xj\ s(xi,Xj) > Si}, \/xj G U. 

The notation s(xi,Xj) denotes similarity between objects Xi and Xj, and Si 
denotes a threshold value of similarity for object Xi. The equivalence relation 
Ri classifies U into two categories: one containing objects similar to Xi and 
another containing objects dissimilar to Xi. When s{xi,Xj) is larger than Si, 
object Xj is considered to be indiscernible to Xi. Similarity s{xi,xj) is calcu- 
lated as a weighted sum of the Mahalanobis distance dM{xi, Xj) of numerical 
attributes and the Hamming distance dii{xi,Xj) of nominal attributes as: 




402 S. Hirano et al. 



Pc 1 


'd- 


d]\4 ) 




p 




Xy'} J 






dnixi.Xj) 


_] 


p 




Xy') J 



where Pc and pd denote the numbers of numerical and nominal attributes, 
respectively. 

Similarity threshold Si is determined based on the gradient of similarity. 
Let s{xi,Xj) be the similarity arranged in descendant order on j and let 
s'{xi, Xj) be the first order derivative of s{xi, Xj). Then s'{xi, Xj) is derived as 
a convolution of s{xi, Xj) and first derivative of Gaussian function as follows. 



s'{Xi,Xj) = 



s(x^,Xu) du, 

, a'^yzTT 



where Xj = 1 and Xj = 0 are used for j < 0 and j > n respectively. Using 
mean and standard deviation of s'{xi, xj), denoted respectively by /is'(z) and 
as'{i), we seek the minimal j* that first satisfies 



s\xi,Xj>) > Ps’ii) + CTs’ii) 

and obtain j* where similarity decreases largely compared to the others. 
Finally, Si is obtained as Si = s{xi,Xj»). 

[End of Definition] 



54.2.2 Modification of Equivalence Relations 

Objects should be classified into the same category when most of equivalence 
relations commonly regard them indiscernible. However, these similar objects 
may be classified into different categories depending on the combination of 
initial equivalence relations. In such a case, unpreferable clustering result 
consists of small, fine categories will be obtained. 

In the second step, we perform modification of initial equivalence rela- 
tions in order to suppress excessive generation of small category. When an 
initial relation classifies two objects into different categories, but the number 
of relations that regard them indiscernible is larger than a given threshold 
Th, this relation is modified to include the two objects into the same category. 



[Definition 2] Indiscernibility Degree of Objects 

An indiscernibility degree, ^{xi,Xj), of two objects Xi and xj is defined as 

1 

'y{xi,Xj) = j—^Sk{xi,Xj), 

' ' k=l 



where 



Sk{xi,Xj) 



1, if [a;fc]fi, n {[x^]r^ n [xj]R^ ^ ^ 

0, otherwise. 



[End of Definition] 
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On giving indiscernibility degree to every pair of objects, modification of 
each equivalence relation is performed according to the following procedure. 

[Definition 3] Modification of Equivalence Relations 

Let Ri,Rj G R be initial equivalence relations and let R'i,R'j G R' be equi- 
valence relations after modification. For an initial equivalence relation Ri, a 
modified equivalence relation R^ is defined as 

R', = {{p[},{u-pm 

where P[ denotes a subset of objects represented by 
P/ = ^ Th}, Vxj G U. 

The value Th denotes the lower threshold value of indiscernibility degree to 
regard Xi and Xj as the indiscernible objects. 

[End of Definition] 



54.2.3 Evaluation of Validity 

Depending on the choice of threshold Th, object number of modified sets of 
equivalence relations can be obtained in the preceding steps. We then evaluate 
validity of their clustering results based on the following criteria. 

1. Modification degree: It represents how largely R was modified to be RL 
High validity is assigned when R' was obtained with small modification. 

2. The number of objects in each category: High validity is assigned when 
each of categories generated by R' contains adequate number of objects. 

[Definition 4] Validity of Clustering Result 

Let U denote the entire set of objects, R denote a set of initial equivalence 
relations, and R' denote the modified set of R, respectively. Suppose that R' 
classifies U into I categories, U / IN D{I{!) = {Ci, C 2 , ..., C;}. A modification 
degree for fc-th category, aR(Cfe), is defined by 

aR(Cfc) = IRCfcl / IRCfcl, 

where RCfc and RCfe denote R-lower and R-upper approximations of Cj, 
given below: 

RCfe = [J I [xi]R^ C Cfc}, 

Xi^Ck 

RCfe = [J I [xi]Ri n Cfc (j)}. 

XiGCic 
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The number of objects in the fc-th category is represented by |Cfc|. Total 
validity of the clustering result, is defined by 

‘ fc=i 

[End of Definition] 



54.3 Experimental Results 

The proposed method was applied to a two-dimensional numerical data con- 
taining 58 objects generated by Neyman-Scott method [54.5]. Results are 
sammarized as follows. Without performing modification of initial equivalence 
relations, we obtained 10 clusters including 7 small ones. With modification, 
equivalence relations which contributed to generation of small clusters were 
modified so that they include similar objects into the close clusters. Con- 
sequently, we obtained 3 expected clusters. It required about three seconds 
to process the numerical data on a workstation (SGI OCTANE2, R12000, 
400MHz) . More detailed description and results on the BALLOON database 
[54.6] are available in the original paper[54.7]. 



54.4 Conclusions 

This paper has proposed a rough sets-based clustering method with modi- 
fication of equivalence relations. In the experiments we demonstrated that 
the use of relative similarity and modification of initial equivalence relation 
led to successful classification on both types of data. It remains as a future 
work to represent knowledge used for classification by using the modified set 
of equivalence relations. 
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This paper proposes architecture of rough set processor. The theory of rough 
sets has a lot of applications such as data mining, decision support system, 
machine learning and so on. However, no specific processor has been develo- 
ped. In this paper, the architecture of rough set processor is shown. 



55.1 Introduction 

Rough set processor is a foundation for large-scale application of the theory. 
Therefore, this paper proposes architecture of rough set processor, which can 
solve large-scale problem in real time. In this paper, the architecture of rough 
set processor is described. 



55.2 Architecture 

The inputs of rough set processor are decision tables and the outputs are 
rules, which are represented by logic functions. 

The block diagram for the rough set processor is shown in figure 55.1. 
The processor will run as a co-processor of host computer, sharing “Main 
Memory”. In this section, data format, execution process, and architecture 
are described. 

55.2.1 Data Format 

The input data is one column of discernibility matrix, which has 2,000 binary 
attributes and one million data. It corresponds to a product of sums expres- 
sion of logic function. The input data consists of the following fields; 

Data: Existence of variables in each logic term; 

Sum: Total number of variables in each logic term; 

Flag: Flag for covering by cores. 

The required memory size is 256 MB for one million logic terms, because 
one logic term requires 256 bytes. 
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Rough Processor 




Fig. 55.1. Block diagram of rough set processor. 



55.2.2 Execution Process 

The execution process consists of two parts, which are pre-process and main 
process[55.3]. In the pre-process, some sparse terms (rows) are selected as 
“cores”, and then implying relation reduces input logic functions. In the main 
process, input logic function is converted to simple sum of products format 
approximately. 

55.2.3 Discernibility Matrix Maker 

The task of this unit is conversion from decision table to binary discernibility 
matrix. This process is very complex, because decision tables include not 
only binary attributes, but also real number or string attributes. For example, 
selection of optimized threshold values for real number attributes is an enough 
complex problem. Therefore, this unit is not treated in the present. 
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55.2.4 Core Selector 

This unit selects the data, which has small value of “Sum” filed, then transfer 
the row number of selected data to “Core Number Register” in this unit. 

55.2.5 Covering Unit 

This unit reducts the given function by implying operation with core data. 
The results of reduction are stored in “Flag” bit. The flag is set “1”, if the 
row can be deleted, while “0”, if it can’t be deleted. 

This implying operation is performed 256 bit in parallel by using four im- 
plying units, which consits of 64 implying cells. The implying relation c C a is 
equivalent to c-|-a. Therefore, by inverting each bit of core data, conventional 
OR-gate is available as implying gate. 

55.2.6 Reconstruction Unit 

The main operations of this unit are, 

OPl Search for dominant logic variables from input function (discernibility 
matrix) . 

OP2 Reconstruction of the function by using the above variables. 

The main task is OPl. For this purpose, it is necessary for counting, 
while the accuracy is not required. Figure 55.2 shows the block diagram of 
counter unit. We use the approximated method to reduce operation time and 
circuit area. The strategies are, 

1. Skip of rows: The unit counts 20% of data randomly. As the result, 
the operation speed becomes five times faster without large error, if the 
input data is large scale and uniform. 

2. Usage of small counter: We use 8-bits counter while 20 bits counters 
are required (10® ~ 2^®). However, all counters have to be shifted right, 
when one of the counters is overflowed. This approximation makes pos- 
sible to reduce the size of counters. Therefore, if we use the same chip 
area, we can increase the degree of parallel. As the results, improvement 
of operation speed can be performed. 

This unit has 64 counters with 8-bit length. The counting operation will 
be executed in 64 bits parallel. The number of overflow is stored in “Over 
Flow Counter” . The counting results are shifted left when the data are stored 
to “Count Cash”. The count cash requires 20 bits x 2,032 words. The address 
of data to be counted is decided by “Random-Row-GEN” . “COUNT MASK” 
is used to reset columns, which are already recognized as dominant attributes. 

After the counting, we must And the dominant variable. An accurate 
method is not necessary for our purpose. Therefore, a simple approximation 
technique is used [55. 3]. 
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<PARALLEL COUNTER UNIT> 




Flag Sum Source Data <Count Cash> 

<Main Memory> 



Fig. 55.2. Block diagram of counter unit. 



55.2.7 Implementation 

To obtain high performance, high-speed data transfer is very important, while 
the gate switching speed is not so important. Therefore, large-scale and high- 
speed cash memory should be prepared. The implementation will be done 
using FPGA (Field Programmable Gate Array) with logic synthesis GAD 
tools. 

55.2.8 Performance Analysis 

It is necessary to compare the performance of the rough set processor with 
software-based tools. We assume that tools run on a RISG machine with large 
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cash memory and optimized compiler system, because the rough set processor 
deals with large but simple data processing. The difference in processing of 
unit data between the processor and software-based tool will be not large. 
The improvement of performance is mainly achieved by parallel processing. 
By introducing next parameters, 

Npc : degree of parallel in columns, 

Npr : degree of parallel in rows, 

Nft : number of logic terms in output function, 

Rh ■ ratio of operating speed of the processor and general (RISC) CPU, 
the performance improvement is given by 

R = Np, ■ Npr ■ Nft ■ Rh- (55.1) 

For example, assuming Npc=128, Npr=4, Nft=16, Rh=2, we can obtain 
i?= 16,384. The key to good performance is design of memory interface sy- 
stem. That is, fast data supply and memory access reduction are important. 



55.3 Conclusion 

In this paper, the architecture of rough set processor is described. The future 
works are the logic synthesis, simulation and implement. 

Acknowledgements. The authors are grateful to Prof. Andrzej Skowron 
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A ’’chance” here means an event or a situation with significant impact on 
human decision making - a new event /situation that can be conceived either 
as an opportunity or as a risk. The ’’discovery” of a chance is to become aware 
of and to explain the significance of a chance, especially if the chance is rare 
and its significance is unnoticed. Desirable effects of opportunities should be 
actively promoted, whereas preventive measures should be taken in the case 
of discovered risks. In other words, chance discovery aims to provide means 
for inventing or surviving in the future, rather than predicting the future. 

The essential aspect of a chance (risk or opportunity) is that it can be 
the seed of new and significant changes in the near future. The discovery of 
new opportunities might be more beneficial than reliance on past frequent 
success-patterns (usually used in prediction methods), because they are not 
known yet by oneself or one’s business rivals. The discovery of new risks 
might be indispensable to avoid or lessen damage, because they cannot be 
explained by past frequent damage-patterns. Therefore, being aware of a 
novel important event without ignoring it as noise in the data is essential for 
a future success. Besides data mining methods for finding rare but important 
events from time-series, it is also important to draw humans attention to such 
events, i.e., to make humans ready to catch chances. In this sense, human- 
information interactions are highly relevant to chance discovery. Furthermore, 
chance discovery can be seen as an extension of risk management to computer- 
aided problem solving where novel situations are involved. 

The workshop on Chance Discovery was intended to bring together studies 
from artificial intelligence, human-computer interaction, social and cognitive 
sciences, marketing researches, risk management, knowledge discovery and 
data mining, and other related domains, for presenting breakthroughs to 
real-world chance discoveries. 

Topics from information visualization and other human-information in- 
teraction designs, for aiding human awareness and discovery of chances, were 
discussed earnestly in the workshop, and this part show the best 12 papers. 
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We propose the use of a dialectical argumentation formalism for chance dis- 
covery in domains where knowledge is distributed across a number of distinct 
knowledge-bases, as in a system of autonomous software agents. Each agent 
may have only a partial view of a problem, and may have insufficient kno- 
wledge to prove particular hypotheses; our formalism provides a means to 
aggregate across these partial views in a consistent manner. We identify a 
novel type of dialogue, which we call a discovery dialogue, and propose a 
formal model for its conduct. We then present locutions and rules for the 
implementation of these dialogues as dialogue-games. 



57.1 Introduction 

In 1994-5, the British Government privatized the state-owned national rail- 
way monopoly, British Rail. They did this by creating one private company, 
Railtrack Ltd, to own and operate the physical network of railway track, and 
then created 25 separate licences, each awarded by competitive tender, to 
operate train services along these tracks in specific geographic regions. In ad- 
dition, other companies were created to provide specific services to Railtrack 
and to the train operating companies, for instance, railway communications. 
The new private companies also outsourced functions which had previously 
been undertaken within British Rail, such as carriage ownership and track 
inspection and maintenance. By one estimate there are now more than 100 
companies where previously there was just one. Thus, what was once a single 
and unified system, is now fragmented, with disparate responsibilities, distri- 
buted knowledge and possibly conflicting interests. No one company in the 
network now has all the information needed to manage it. The results of 
this were seen on 17 October 2000, when faulty track caused a derailment 
at Hatfield, killing four people and injuring 70. Although an inquiry still has 
to establish ultimate responsibility for the accident, knowledge of the faulty 
track appears to have been known to the company tasked with network ins- 
pection, a sub-contractor to Railtrack [57.4]. 

The problem of interest here is how to identify risks and opportunities 
{“Chance Discovery”) in situations where knowledge is distributed across 
multiple autonomous agents. We believe that systems of dialectical argumen- 
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tation, which enable the coherent combination of disparate knowledge types 
and sources, are applicable to this problem and we present here a formalism 
for such a system. For simplicity, we assume that the agents involved do not 
perceive they have divergent economic or other interests, and so are willing 
to share information fully with each other. We present our formalism in Sec- 
tion 3, after briefly reviewing argumentation and formal dialogue games in 
Section 2. In earlier work, we presented a similar dialectical argumentation 
structure for dialogues over risk in environmental domains, which we which 
termed a Risk Agora [57.7]. Accordingly, we call the formalism presented in 
this paper a Discovery Agora. Section 4 concludes the paper. 



57.2 Argumentation 

In common English usage, an “argument” has two meanings: a case for (or 
against) a particular claim, and a debate between two or more people. Ar- 
guments in both senses have been studied by philosophers since at least the 
time of Aristotle, in a branch of philosophy now called argumentation theory. 
In this paper, we will use the word argument for only the first meaning, and 
the words debate or dialogue for the second. Considering arguments as ca- 
ses for claims, argumentation theory has examined issues such as what con- 
stitutes a good or bad argument, under what circumstances is it rational 
to use non-deductive arguments, and what relationships may exist between 
distinct arguments. For dialogues and debates, philosophers have explored 
issues such as how may such debates be organized and structured, what ru- 
les are appropriate for different types of interaction, and what impacts arise 
from variations in these rules. In both areas, philosophers of argumentation 
have been particularly active since the mid-1950s, perhaps in response to the 
development of formal, deductive logic in the century before that. See [57.11] 
for a comprehensive review of argumentation theory, and [57.2] for a review 
of some applications of argumentation in Artificial Intelligence. In this pa- 
per we will use formal dialogue games to model dialogues, so we first discuss 
different types of dialogues. 

In an influential typology, Doug Walton and Erik Krabbe [57.12] identi- 
fied six primary types of dialogue, distinguished by their initial situations, 
the goals of each of their participants, and the goals of the dialogue itself 
(which may differ from those of its participants). The six dialogue types 
were: Information-seeking dialogues, in which participant seeks the answer 
to some question from another participant; Inquiries, in which all partici- 
pants collaborate to answer some question to which none has the answer; 
Persuasion dialogues, in which one participant seeks to convince others of 
the truth of some proposition; Negotiations, in which participants attempt to 
divide up a scarce resource; Deliberations, in which participants collaborate 
to decide what actions to take in some situation; and Eristic (strife-ridden) 
dialogues, in which participants quarrel verbally as a substitute to physical 
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fighting. Most human dialogues may be seen as examples of these six or com- 
binations thereof, although Walton and Krabbe do not claim their typology 
is comprehensive. 

How may specific types of dialogues be modeled? To do this, we draw 
on the formal dialogue games proposed by philosophers to better understand 
fallacious modes of reasoning [57.3]. These are games between two or more 
players, where the “moves” made by the players are locutions, i.e. spoken 
utterances. Recently, such games have been applied in Artificial Intelligence 
[57.1], particularly for automated dialogues between autonomous agents, and 
we have been led to propose a formal model for dialogue-games [57.8]. In our 
model, it is assumed that the topics of discussion between the participants are 
represented in some logical language, whose well- formed formulae are denoted 
by the lower-case Roman letters, p, q, r, etc. The rules of the dialogue-game 
can be divided into several distinct types: 

Commencement Rules: Rules which define the circumstances under which 
the dialogue commences. 

Locutions: Rules which indicate what utterances are permitted. Typically, 
legal locutions permit participants to assert propositions, permit others 
to question or contest prior assertions, and permit those asserting pro- 
positions which are subsequently questioned or contested to justify their 
assertions. 

Combination Rules: Rules which define the dialogical contexts under which 
particular locutions are permitted or not, or obligatory or not. For in- 
stance, it may not be permitted for a participant to assert a proposition 
p and subsequently the proposition ~^p in the same dialogue, without in 
the interim having retracted the former assertion. 

Commitments: Rules which define the circumstances under which partici- 
pants express dialogic commitment to a proposition. Typically, assertion 
of a claim p in the debate is defined as indicating to the other participants 
some level of commitment to, or support for, the claim. Since Hamblin 
[57.3], it is common to track commitments in publicly-accessible stores 
called Commitment Stores. 

Termination Rules: Rules which define the circumstances under which the 
dialogue ends. These rules may be expressible in terms of the contents of 
the Commitment Stores of one or more participants. 

Instantiating these rules for different types of dialogue has been a recent 
research focus. For example, Walton and Krabbe [57.12] presented formal 
models for persuasion dialogues, and, in joint work with David Hitchcock 
[57.6], we have proposed the first formal model for deliberation dialogues. 
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57.3 The Discovery Agora: Formal Structure 

With these considerations in mind, we now present our formal structure for 
the argumentation system for chance discovery, which we have called a Di- 
scovery Agora. We assume, as above, that the topics of discussion between 
the participants are represented in some logical language, £, closed under the 
usual connectives, whose well-formed formulae are denoted by the lower-case 
Roman letters, p, q, r, etc. Although the participants may believe different 
sets of axioms (premises) to be true, we assume they have agreed a set of 
deductive inference rules. We refer to this logical language and the agreed 
inference rules as the common logic of the Agora. We further assume a denu- 
merable set of autonomous software agents who participate in the dialogue, 
each of whom is denoted by Vi, indexed hy i G 2. We assume in this paper 
that each agent uses the same logical language and rules of inference, and 
that they differ only in the information they know to be true (i.e. in their 
premises). Thus, each participant may know part of the story but not the 
whole story. We further assume that the agents have no inhibitions about 
sharing information with each other in the Agora. 

In this section we present the formal model for the Discovery Agora. 
We do this, firstly, in Section 3.1, with an informal discussion of discovery 
dialogues to motivate our formalism, then, in Section 3.2, a formal model of a 
Discovery Dialogue in the Agora. Section 3.3 presents the rules for a dialogue 
game in conformance with the formal model. 



57.3.1 Discovery Dialogues 

We assume the agents in the Discovery Agora are engaged in dialogue. Which 
of the Walton and Krabbe [57.12] dialogue types mentioned above is appro- 
priate to this domain? The closest type would appear to be an Inquiry dialo- 
gue, where participants collaborate to ascertain the truth of some question. 
However, for the domain of Chance Discovery we want to discover something 
not previously known; the question whose truth is to be ascertained may only 
emerge in the course of the dialogue. This feature is similar to Deliberation 
dialogues, where the course of action adopted by the participants may also 
only emerge in the course of the dialogue itself. The other five types of dialo- 
gue all begin with some question or issue for discussion. We therefore believe 
that the dialogue type appropriate to the Chance Discovery domain is not 
one of the types of Walton and Krabbe. We will call it a Discovery Dialogue. 

These dialogues differ from Inquiries in another way. In a pure inquiry, the 
participants would wish to seek the truth, unadulterated by their preferences 
or emotional responses. This is unlike a Deliberation, where the preferences 
or emotions of the participants could play an important part in the selection 
of an optimal course of action. While the participants in a Discovery dialo- 
gue are also seeking truth, there may be many possible truths. It would be 
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sensible for the participants to filter the truths they discover by what is inte- 
resting, novel or important. Discovering risks, for instance, means identifying 
potential outcomes with significant and deleterious consequences. 

How might a dialogue concerning chance discovery proceed? We could 
imagine a number of elements to such a dialogue. Firstly, there would be 
agreement (perhaps implicit) about the purpose of the dialogue; this could 
be, for instance, to assess the risks inherent in some situation or technology. 
Next there may be the sharing of relevant information known by each of the 
participants and the pooling of this knowledge to generate new knowledge. 
For dialogues seeking to discover consequences or risks, there may also be 
discussion concerning the possible mechanisms by which such risks or conse- 
quences could occur. These mechanisms may be chains of possible scientific 
causality (as in cellular-level biomedical mechanisms) or metaphorical or ana- 
logical modes of reasoning. Legal reasoning concerning the potential motives, 
opportunity and means of a suspect to commit a crime is another example 
of such mechanisms. Then, once potential discoveries have been articulated 
in the dialogue, there may be discussion over their attributes. For instance: 
Are they equally important? What are their relative consequential losses or 
benefits? etc. This discussion over attributes may in turn lead to considera- 
tion of experiments or data collection activities to verify which of competing 
hypotheses is more likely correct in explaining causal effects. In human dia- 
logues, of course, such discussions do not occur in a linear fashion, but move 
back and forth between these various elements as the discussion evolves. Our 
formal model, to be presented next, will include each of these elements and 
allow for non-linear dialogues. 



57.3.2 Model of a Discovery Dialogue 

In this section we formalize the discussion just presented. We begin by de- 
fining the elements of the discovery dialogue, drawing on the model of an 
argument proposed by Stephen Toulmin [57.10]. 

Purpose: The Purpose of a dialogue is the overall issue or issues which 
motivated the participants to convene and which governs their dialogue. 
Examples include the risks or the opportunities of some situation. We 
assume that a discovery dialogue is initiated by one of the participants 
with a proposed purpose. However, the other participants may not share 
the same understanding of the dialogue’s purpose, and so this needs to 
be discussed at the outset. 

Data Item: A Data Item is a proposition for which at least one dialogue 
participant has a proof, using premises in that participant’s knowledge 
base and using the rules of inference of the common logic. Participants 
who articulate data items will be required to present the arguments for 
them, if requested in the Agora. 
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Inference Mechanism: An inference mechanism is a warrant which justifies 
the drawing of a conclusion from one or more data items. Examples 
of mechanisms include: the rules of deductive inference of the common 
logic of the participants; default rules; causal mechanisms; metaphors 
and analogies; legal precedents, etc. 

Consequence: A consequence is a claim arising from the application of an 
inference mechanism to one or more data items. 

Criterion: A criterion is an attribute of a data item or a consequence, which 
may be used to compare one data item or consequence with another. Ex- 
amples of criteria include: novelty; importance; costs; benefits; feasibility; 
etc. 

Test: A test is a procedure, generally undertaken outside the Discovery 
dialogue, to ascertain the truth-value of some unknown variable. Exam- 
ples include: scientific experiments; data collection exercises; information- 
seeking dialogues. 

Conclusion: A conclusion is a full or partial response to the purpose of the 
dialogue. For example, conclusion could include significant risks identified 
in the course of the dialogue or interesting opportunities. 

With these elements defined, we now present a formal model of the dialogue 

itself, which moves through ten stages. Our model is similar in approach to the 

formal model for deliberation dialogues we developed with David Hitchcock 

in [57.6]. 

Open Dialogue: Opening of the discovery dialogue. 

Discuss Purpose: Discussion of the purpose of the dialogue. 

Share Knowledge: Presentation of data items relevant to the purpose, dra- 
wing only on each participant’s individual knowledge base. 

Discuss Mechanisms: Discussion of potential rules of inference, causal me- 
chanisms, metaphorical modes of reasoning, legal theories, etc. 

Infer Consequences: Identification of the consequences arising from the ap- 
plication of inference mechanisms to the data items presented by the 
participants. 

Discuss Criteria: Discussion of possible criteria for assessment of the conse- 
quences presented. 

Assess Consequences: Discussion of the data items and consequences against 
the criteria previously suggested. 

Discuss Tests: Discussion of need for undertaking tests of proposed conse- 
quences. If such tests are conducted outside the dialogue, the results 
may be reported back to the dialogue in a Share Knowledge stage. 

Propose Conclusions: Proposing one or more conclusions for possible accep- 
tance by the participants. 

Close Dialogue: Closing of the discovery dialogue. 



Agreement is not necessary in these dialogues unless the participants so desire 
it. If so, the Propose Conclusions stage allows a participant to propose a 
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conclusion for possible acceptance, and then allows participants to indicate to 
the Agora their individual acceptance or otherwise. The stages of a discovery 
dialogue may be undertaken in any order and may be repeated, subject only 
to certain constraints, which we have articulated in [57.9]. For example, the 
Discuss Purpose must precede any instance of every other stage, excepting 
the Open Dialogue and Close Dialogue stages. 



57.3.3 Dialogue Game Rules 

We now present examples of dialogue-game locutions which, taken together, 
enable a discovery dialogue to be conducted according to the model just 
presented. For reasons of space, we do not present all the locutions, nor all 
the necessary pre-conditions for, and the consequences of, their utterance. We 
continue to assume that the subject-matter of dialogues can be represented 
in a propositional language by lower-case Roman letters. We define questions 
as propositions with one or more free variables, and we represent these by 
lower-case Roman letters suffixed with a question-mark, e.g “p?”. We assume 
a Commitment Store CS{Pi) exists for each agent Pi. This store contains the 
various propositions which the agent has publicly accepted, and each store 
can be viewed by all participants. Entries in the stores are of three sorts: (a) 
2-tuples of the form {type,t), where t is a valid instance of type type, with 
type G {purpose, data item, inference mechanism, consequence, criterion, 
test, conclusion}] (b) 3-tuples of the form {c,t,p), where c is a consequence, 
t is a criterion and p a proposition; and (c) 3-tuples of the form (ci,C2,t), 
where c\ and C 2 are consequences and t a criterion. The permitted locutions 
are: 

open_dialogue(Pi,p): Participant Pi proposes the opening of a Discovery dia- 
logue to consider the proposed purpose p. A dialogue can only commence 
with this move. 

enter_dialogue(Pj,p): Participant Pj indicates a willingness to join a Disco- 
very dialogue to consider the purpose p. All intending participants other 
than the mover of open_dialogue(.) must announce their participa- 
tion with this move. Note that neither the open_dialogue(.) nor the 
enter _dialogue(.) move implies that the speaker accepts that p is the 
most appropriate formulation of the purpose, only that he or she is willing 
to enter into a discussion about it at this time. 
propose(Pi, type, t): Participant Pi proposes proposition t as a, valid instance 
of type type, where type G {purpose, data item, inference mechanism, 
consequence, criterion, test, conclusion}. 
assert(Pi, type,t): Participant Pi asserts proposition t as a valid instance of 
type type, where type G {purpose, data item, inference mechanism, con- 
sequence, criterion, test, conclusion}. This is a stronger locution than 
propose(.), and results in the tuple (type,t) being inserted into CS{Pi), 
the Commitment Store of Pi. For certain types, utterance of this locution 
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leads to the speaker having a burden of defence, i.e. to provide suppor- 
ting arguments or evidence for the assertion if so requested by another 
participant. 

query(Pj,propose(Pi, type,t))\ Participant Pj requests participant Pi to pro- 
vide a justification for his proposal of t as a valid instance of type 
type, where type G {data item, consequence, test}, and where j yf i. 
Similarly, a participant may query an assertion with the command 
query (Pj, assert (Pj, type, t)). In response to either query, Pj must de- 
fend his proposal or assertion statement with an utterance of show_arg(.) 

show_arg(Pj, t?/pe, t, A): Participant Pj presents an argument A for proposi- 
tion p which is type type G {data item, consequence, test}. In the case 
of data items, the argument A is a proof from premises in the knowledge 
base of participant Pj and using deductive inference rules in the common 
logic of the dialogue. In the case of consequences, the argument A com- 
prises one or more sequences of the form (D,I,C), where P is a set of 
data items, I is an inference mechanism and C is a consequence which 
can be drawn from D using /; all elements of this set must previously 
been articulated in the Agora by means of appropriate propose(.) or 
assert(.) locutions. In the case of tests, the argument A also comprises 
one or more sequences of the form (D,I,C), where P is a set of data 
items, I is an inference mechanism and (7 is a consequence which can be 
drawn from P using I, but these need not have been previously presented 
in the dialogue. 

assess(Pj, c, t,p): Participant Pj asserts that when consequence c is assessed 
on the basis of criterion t, one may conclude proposition p. This locution 
inserts (c,t,p) into CS{Pi). 

compare(Pj, Cl, C2,t): Participant Pi asserts that consequence ci is better or 
equal to consequence C2 when they are compared on the basis of criterion 
t. Each of ci,C2 and t must previously been articulated in the Agora 
by means of the appropriate propose(.) or assert(.) locutions. This 
locution inserts (ci,C2,t) into CS{Pi). 

recommend(Pj, conclusion, a): Participant Pj proposes proposition a as a re- 
commended conclusion. This locution inserts {conclusion, a) into CS{Pi). 

accept(Py, locution): Participant Pj indicates agreement with the prior locu- 
tion, locution, uttered by another participant. If the prior locution resul- 
ted in a change to that speaker’s commitment store then the accept (.) 
locution similarly alters CS{Pj). 

contest{Pj,locution): Participant Pj indicates disagreement with the prior 
locution, locution, uttered by another participant. The contest (.) locu- 
tion is the obverse of accept (.) and has no impact on CS{Pj). 

retra,ct{Pi,locution): Participant Pj indicates retraction of her prior ut- 
terance of the locution locution. If the prior locution resulted in an ins- 
ertion into CS{Pi), then the retract (.) locution deletes it. 
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withdraw_dialogue(Pi,p): Participant Pi announces her withdrawal from the 
Discovery dialogue to consider the governing question p. 

We now demonstrate that the dialogue game locutions we have defined can 
be used to undertake a Discovery dialogue in accordance with the formal 
model we have proposed. 

Proposition 1 Each of the ten stages of the formal model of discovery 
dialogues presented in section 3.2 can be executed by judicious choice of 
these dialogue-game locutions. 

Proof. We consider each stage in turn: 

1. A dialogue opens with the locution open_dialogue(Pj,p) and at least 
one utterance of enter .dialogue (Pj,p), with j yf i. 

2. The Discuss Purpose stage consists of utterances of propose(.), as- 
sert(.), accept(.), contest(.) and retract(.), in each case with the 
type purpose. 

3. The Share Knowledge stage consists of utterances of propose(.), as- 
sert(.), accept(.), query(.), show_arg(.), contest(.) and retract(.), 
in each case with the type data item. 

4. The Discuss Mechanism stage consists of utterances of propose(.), as- 
sert(.), accept(.), contest(.) and retract(.), in each case with the 
type inference mechanism. 

5. The Infer Consequences stage consists of utterances of propose(.), as- 
sert(.), accept(.), query(.), show_arg(.), contest(.) and retract(.), 
in each case with the type consequence. 

6. The Discuss Criteria stage consists of utterances of propose(.), as- 
sert(.), accept(.), contest(.) and retract(.), in each case with the 
type criterion. 

7. The Assess Consequences stage consists of utterances of assess(.), com- 
pare(.), agree(.), contest(.) and retract(.). 

8. The Discuss Tests stage consists of utterances of propose(.), assert(.), 
accept(.), contest(.) and retract(.), in each case with the type test. 

9. The Propose Conclusions stage consists of the recommend(Pi, action, a) 
locution, possibly followed by utterances of accept(P,-, action, a). 

10. Participants may exit a dialogue, by means of the withdraw .dialogue 
(Pi,p) locution, at any time. The dialogue itself closes once the second- 
last participant utters this locution. (EOP)) 

In addition to defining the permitted locutions, we have specified commence- 
ment, combination, commitment and termination rules for this game [57.9]. 
These rules have been specified to accord with principles of joint mutual 
inquiry between consenting participants proposed by Hitchcock in [57.5]. 
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57.4 Conclusion 

This paper has proposed a formal argumentation system for chance discovery 
in domains where knowledge is distributed between autonomous agents. Our 
approach has led us to propose a new type of dialogue, which we call a chance 
discovery dialogue, for arguments in this domain. We have proposed a formal 
model for the conduct of discovery dialogues, and presented the locutions 
and rules for a dialogue game undertaken in accordance with this model. 

We are currently considering a number of further research lines, in parti- 
cular the question of whether discovery dialogues can be automated. We are 
exploring the use of evolutionary computational approaches to enable fully 
automated dialogues to be conducted. For example, it may be possible to use 
a genetic algorithm to generate candidate discoveries which are then proces- 
sed through the dialogue-game described above. Those which are accepted 
by the participants to the discussion may be considered the fittest survivors 
and so form the basis for a subsequent generation of discoveries, which are 
discussed in subsequent dialogues.^ 
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This paper investigates the methodological foundations of a new research field 
called chance discovery, which aims to detect future opportunities and risks. 
By drawing on concepts from cybernetics and system theory, it is argued that 
chance discovery best applies to open systems that are equipped with regula- 
tory and anticipatory mechanisms. Non-determinism, freedom (entropy) and 
open systems property are motivated as basic assumptions underlying chance 
discovery. The prediction-explanation asymmetry and evaluation of chance 
discovery models are discussed a fundamental problems of this field. 



58.1 Introduction 

Several researchers within the Knowledge Discovery in Databases (KDD) 
community (e.g., Ohsawa [58.9]) questioned whether the methods of this re- 
search field are able to find what they call ‘chances’. Chances refer to pheno- 
mena that might have a (high) impact to the scientific (and human) society or 
an enterprise in the future. High impact is intended to have two complemen- 
tary readings: on the one hand it refers to opportunities, i.e., the possibility 
to bring about desirable effects; on the other it refers to risks, i.e., possible 
threats to an enterprise or society. The notion of chance discovery has been 
coined to cover both aspects. Finding future features is seen in contrast to 
prediction (e.g., in KDD), the scientific activity to derive phenomena that 
appear at some future time point. By contrast, chance discovery explicitly 
integrates human initiative into the discovery process. 

Procedurally, chance discovery can be seen as a two-step activity. The 
first step involves a actual discovery of a certain phenomenon. The second 
step suggests actions taken as a consequence of a designated phenomenon 
(chance), which is often called (chance) management and involves supportive 
measures in the case of opportunities as well as preventive measures in the 
case of risks. 

Although there might be some interesting interactions with the proba- 
bilistic notion of chance, this reading is not intended in chance discovery. 
Likewise, chance discovery is not concerned with discovery by chance, such 
as the discovery and isolation of penicillin by Alexander Fleming. 
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We will discuss the following topics. In the following section, the notion of 
open system is explicated in terms of cybernetics and system theory, and the 
possibility of prediction is discussed for both nature and open systems. The 
next section discusses chance discovery in open systems. In particular, the 
notion of ‘anticipation’ is introduced as a mechanism for chance discovery 
and exemplified by examples. After that, we explicate notions underlying 
the possibility of chance discovery: uncertainty and freedom. In the following 
section, chance discovery is contrasted with KDD. Finally, we briefly discuss 
and conclude the paper. 



58.2 Nature vs. Open Systems 

To clarify the application held of chance discovery, we draw a broad di- 
stinction about the object of investigation: nature vs. open systems [58.12]. 
Whereas nature is governed by natural laws, open systems are typically mo- 
deled abstractly by cybernetics [58.1] and system theory [58.16]. Examples of 
open systems include ‘living’ systems such as human beings, scientific com- 
munities and companies, and artificial (or technical) systems, e.g., cars and 
power plants. Both kinds can be described by the following system-theoretical 
(S'! — 2) and cybernetical (Cl — 2) features (Schurz [58.12]): 

S'! Open systems are physical ensembles placed into an environment signifi- 
cantly larger than themselves. There is a continuous exchange of energy 
between system and environment. The environment may satisfy the sy- 
stem’s ‘needs’ (see Cl) or ‘destroy’ the system (see C2). 

S2 Open systems preserve a relative identity through time, called their dis- 
sipative state. 

Cl The identity in time is abstractly governed by ideal states (or norm sta- 
tes) which the system tries to approximate, given its actual state. 

C2 Regulatory mechanisms compensate disturbing influences of the envi- 
ronment, i.e., they continuously try to counteract influences that move 
the system apart from its ideal state. If the external influences exceed a 
‘manageable’ range, the system is destroyed. 

For our present discussion, the regulatory mechanisms of open systems are 
of central concern since they can actively interfere with the evolution of the 
system, by bringing about (an approximation of) the ideal state, or avoid 
the destruction of the system. Later, we will introduce a new kind of mecha- 
nism, called ‘anticipation’, that has the potential to significantly influence 
the systems evolution and most closely corresponds to our notion of chance 
discovery. 

58.2.1 Prediction in the Natural Sciences 

Nature is governed by the laws of physics, e.g., Newton’s second axiom (the 
total force law) . Obviously, in the physics domain there is no way to influence 
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the natural laws. So even if we predict a phenomenon of high impact to 
society, such as a giant meteorite approaching the earth at high speed, all we 
can do is to evacuate the area the meteorite is predicted to hit. 

Since it is not possible to change the course of nature, chance discovery 
here means to take appropriate (supportive, preventive) measures to minimize 
damage or maximize benefit. 

58.2.2 Prediction in Open Systems 

Open systems are characterized by system laws. Schurz [58.12] argued that 
we are theoretically unable to determine the exact numerical values corre- 
sponding to system laws, because the systems are open and hence described 
by nonlinear differential equations. In the extreme case, if external influences 
exceed the manageable (or critical) range of the system, nonlinear dynamics 
becomes effective and leads to chaotic behavior. Due to the sensitivity of 
open systems to external influences, prediction is a difficult matter. Below 
we will argue that in open systems, the activity of regulatory mechanisms is 
of major importance, rather than prediction. 



58.3 Chance Discovery in Open Systems 

58.3.1 Enterprise Example 

Let us first give an illustrative example. Enterprises (companies) can be 
viewed as open systems that consist of subsystems (branches, sections, and 
individuals), and operate in an environment, the so-called ‘economic mar- 
ket’. This environment typically satisfies the companies ‘needs’, e.g., custo- 
mers demand the company’s products. Under unfortunate circumstances, the 
company may run into the risk of being ‘destroyed’, e.g., by the appearance 
of a strong competitor (cf. S'!). In spite of that, companies preserve identity 
through time (cf. S2). A company constantly tries to approximate an ideal 
state where, for instance, increasing profits are made and the economic situa- 
tion of the company is stable. This is achieved by the company’s subsystems 
that perform certain functions, including good production and distribution, 
and marketing (cf. Cl). A company is typically confronted with a multitude 
of ‘disturbing’ influences in the form of, e.g., cheaper and better products of 
other companies and changing customer needs. At this point, the regulatory 
mechanisms of the company come into force, e.g., to lower production costs 
by increasing the efficiency of the production cycle. It is well-known that 
companies go bankrupt when a critical range is exceeded (cf. (72). 

58.3.2 The Limits of Regulatory Mechanisms 

Regulatory mechanisms are the system’s means to approximate the system’s 
ideal state. Those mechanisms are mainly active to compensate disturbing 
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influences by reacting to them. Although regulatory mechanisms are usually 
able to guarantee the identity of an open system, they come into force only if 
confronted with ‘threats’ from the environment. For instance, if a company’s 
sales decrease, the CEO might decide to shrink the company, thereby making 
a number of people unemployed. 

In the next section we will argue that in addition to regulatory mecha- 
nisms, open systems need mechanisms of anticipation to cope with the com- 
plexities and influences of the environment. 

58.3.3 Chance Discovery as Anticipation 

In a recent report to the Club of Rome, Botkin et al. [58.2] introduce the 
term “anticipation” as a key feature of innovative learning that emphasizes 
human initiative. It is described as follows [58.2, p. 25]: 

[...] anticipation is not limited to simply encouraging desirable trends 

and averting potentially catastrophic ones: it is also the “inventing” 

or creating of new alternatives where none existed before. 

Anticipation is contrasted to prediction, since the former focuses on the crea- 
tion of possible and desirable futures, and plans to bring them about. The 
notion of anticipation shares the intuition of Alan Kay’s phrase “The best 
way to predict the future is to invent the future” . 

Promotion. In philosophy of science, the term “self-fulfilling prophecy” de- 
scribes situations such as the following. Newspapers write articles about the 
morbidity of a bank institute. As a consequence, many customers of this in- 
stitute withdraw their money and other commitments. In effect, the bank 
institute gets into serious trouble. A recent ‘real’ example is the success of 
the so-called New Economy (internet and telecommunication related shares). 
Since many people believed in its success, it became a great success (at least 
for some time). 

Chance discovery as anticipation in this context means the promotion 
of a trend desired by New Economy companies. As a result of promotion, 
the desired trend could be effected. Similar forms of promotion are daily 
practice in companies: certain products are advertised with the hope that 
they actually trigger a desire in customers. The detection of ‘latent’ customer 
desires will be briefly discussed in the next section. 

Collaboration. In business there is a lot of talk about ‘mergers’. Collabo- 
rations are also seen in scientific research programs. We will briefly describe 
the held of Quantum Computation. 

Deutsch [58.3] is reported to be the first to explicitly ask whether it is 
possible to compute more efficiently on a quantum computer. For a long time, 
this possible collaboration of quantum theory (physics) and artificial intel- 
ligence (computer science) remained a curiosity. However, there are already 
some indications of ‘killer applications’ for quantum theory. For instance. 




58. Methodological Considerations on Chance Discovery 429 



Spector et al. [58.13] report on problems that take polynomial time on a 
quantum computer but exponential time on a classical computer. 

In academics, possibilities for collaborations are ubiquitous, and someti- 
mes realized, e.g., in genome analysis, artificial intelligence and biology colla- 
borate. What might chance discovery as anticipation mean here? In particu- 
lar, how can we anticipate the success of a certain kind of collaboration? We 
cannot provide a working methodology here. In the case of quantum com- 
putation, the chance was ‘discovered’ by Feynman [58.5] who observed that 
classical systems cannot effectively model quantum mechanical systems. This 
observation suggests that computers based on the laws of quantum mechanics 
(instead of classical physics) could be used to efficiently model quantum me- 
chanical systems, and possibly even solve classical problems such as database 
search in a highly efficient way. 

Given that Quantum Computation will indeed be successful, how could 
we have known 10 years ago? One method would be to track the history 
of ‘conjectures’ (ideas, observations) formulated by various insightful rese- 
archers, and evaluate their feasibility in the light of current knowledge in 
possibly quite different research areas. The availability of huge amounts of 
information on the Web might facilitate such an endeavor. 



58.4 Chance Discovery, Uncertainty, Freedom 

One of the tacit assumptions underlying chance discovery is that the future 
is uncertain, and hence there is freedom to change is course of action. For 
the sake of argument, assume the opposite, i.e., the world history evolves de- 
terministically. Obviously, under this artificial assumption, chance discovery 
(in our sense) is not possible as there are no choices. 



58.4.1 Freedom 

Following [58.15], we propose entropy as the measurement of freedom. Spe- 
cifically, the measurement of freedom is phenomenologically rather than pro- 
cedurally oriented. The freedom of a set A of alternatives is measured by the 
entropy H of the actual chosen proportions, i.e., 

H{A) = - '^p.logPz 

i^A 

where log is to the base 2, pi > 0 and if pi = 0 then 0 log 0 = 0. Accor- 
dingly, we may say that chances exist if there are (almost) evenly distributed 
alternatives. Consider the following situations (A) and (B). 

(A) There are three sellers with (approximately) 30% market share. 

(B) There are two sellers, one has 75%, the other has 25% market share. 
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Situation (A) has more freedom than situation (B), since a market with 
one dominant provider has low entropy. The more interesting notion here is 
freedom of successive states for a number of time periods. For instance, a 
market with 100% customer loyalty is not free. 

58.4.2 Explaining versus Predicting 

Let us recall the aforementioned open system situation, that features a high 
degree of uncertainty, and formulate it as a problem for chance discovery and 
chance management (CD&CM). In the following, M stands for a CD&CM 
model (or theory). 

— Assume as given a model M that explains why a particular phenomenon 
X turned out to be a chance (opportunity or risk), as observed by its high 
(positive or negative) impact. 

— Given a phenomenon of type X, can we employ M to predict high impact 
under comparable circumstances? 

Of course, the notions of phenomenon of type and comparable warrant further 
explication. In order to clarify the problem, consider the case of simple un- 
stable or chaotic systems that support explanations without predictive value. 
Assume an ideal ball exactly on top of another ideal ball. Here, we cannot 
predict in which direction the ball will roll down, but after it rolled down, 
we can explain it by an unmeasureably small disturbance in the direction in 
which the ball rolled down [58.11]. 

Thus, the ‘explanation vs. prediction’ problem raises the fundamental 
question about which systems support the predictive use of chance discovery 
results. Straightforward answers seem to be ruled out by the fact that human 
initiative is essential to take opportunities or avoid risks, and the complexity 
of systems such as the web or financial markets. 

As a more realistic example, consider Ogawa’s [58.7] ILE (Information of 
Liability and Equity) measure that identifies risk factors that eventually lead 
to bankruptcy. Specifically, ILE explains bankruptcy. The crucial question, 
however, as in science is whether ILE can predict bankruptcy. If ILE has 
predictive value, the impact of preventive measures can be proven. Given the 
theoretical result about the infeasibility of prediction in open system, we are 
left with a probabilistic notion of prediction. 



58.5 Scientific Evaluation of Theories 

A basic question about scientific theories is how they can be evaluated. Fol- 
lowing Popper [58.10], a theory is corraborated (or validated) if it predicts 
a phenomenon that is actually observed, while it is falsified when a pheno- 
menon is observed that contradicts the observation. Note that a theory can 
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never be verified by a finite set of observations. The situation for CD&CD 
models is complicated for the following reason. 

Triple-theory Problem. Whether the discovery of a potential chance turns 
into a positive result is dependent on three factors: 

1. The designated phenomenon was a ‘real’ chance, i.e., chance discovery is 
successful. 

2. The chosen measures were appropriate, i.e., chance management was suc- 
cessful. 

3. The predictions about the world for the associated time span of CD&CD 
were sufficiently accurate. 

The triple-theory problem refers to the practical problem that in order to 
validate (or falsify) a CD&CM model, three sub-theories have to be successful. 
If all of them are successful, observed by the positive result, the model is 
corraborated. However, in the case of a negative result, we cannot simply say 
that the designated phenomenon was no chance, because we either did not 
choose appropriate (supportive or preventive) measures to bring about the 
positive outcome or our predictions about the boundary conditions for the 
positive outcome have been false. 

From a methodological point of view, the triple-theory problem puts se- 
rious doubts whether we might be able to evaluate CD&CM models scienti- 
fically. Due to the very nature of the open systems, reproducibility of results 
is infeasible. 



58.6 Chance Discovery vs. KDD 

Fayyad et al. [58.4] characterize Knowledge Discovery in Databases (KDD) 
as 

[...] the nontrivial process of identifying valid, novel, potentially use- 
ful, and ultimately understandable patterns in data. 

The discovery goal in KDD can be divided into a descriptive and a predictive 
part. In description the system seeks for patterns (or models) in order to 
present them to the user in an intelligible way; in prediction the system finds 
patterns so that the future behavior of some entity can be predicted. There 
exist a number of established (mostly statistical) data mining methods to 
achieve those goals, such as classification, regression, clustering, summariza- 
tion, dependency modeling, and change and deviation detection [58.4]. 

Chance discovery may use the knowledge extracted by data mining me- 
thods to detect future features. For instance, by Web usage mining, i.e., the 
clustering of Web users based on their browsing activities, potential customer 
groups can be identified, and specifically addressed by companies. Here the in- 
terplay of data mining — describing correlations between users’ interests — and 
chance discovery — actively promoting a possibility — is of crucial importance. 
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One may ask whether, e.g., data mining already is a form of chance 
discovery. Our answer is “no” . Data mining can summarize or predict trends, 
but leaves out the role of human interference. Anticipation as a mechanism 
of an open system, on the other hand, ‘matches’ a desired (or predicted) 
trend with the system’s goals (typically human ‘desires’) and accordingly 
takes supportive or preventive measures. 

Another way of contrasting Chance Discovery and KDD is as follows. 
Whereas KDD tries to detect most likely trends in data. Chance Discovery 
aims at finding data that do not match likely patterns but indicate interesting 
phenomena not yet exploited and bearing potential of future trends. However, 
currently there exist no serious analysis to distinguish those high-potential 
phenomena from ‘noise’ in data. Basically, this means that exceptions can be 
equally informative as highly probable regularities. As an example, consider 
the following. Humans that are infected with plasmodium vivax are very 
likely to contract malaria. However, some people do not. In KDD terms, 
those people are ignored since they do fall under the likely case (contracting 
malaria). It turned out that it is due to a special genetic constellation that 
some people have a strong protection against malaria. In Chance Discovery 
terms, the explanation of those people’s resistance against malaria is a chance 
for a significant scientific discovery. 



58.7 Discussion and Conclusion 

In this paper, we explicate our take on a new research area called ‘Chance Di- 
scovery’. The notion of ‘open system’, as characterized in cybernetics and sy- 
stem theory, serves as a framework to embed the activity of Chance Discovery. 
In particular, anticipation is introduced as a mechanism that may perform 
the role of detecting chances in open systems. The anticipating mechanism is 
explained in the context of promotion in New Economy and collaboration in 
the Quantum Computation research programme. Chance Discovery is con- 
trasted to KDD and mutually beneficial aspects are explained. We identify 
human initiative as a distinguishing feature of Chance Discovery (as oppo- 
sed to KDD), e.g., to actively initiate and foster a trend by promotion or to 
actively explore the (practical) feasibility of a theoretical conjecture. 

Unlike the practical methods for data mining, we only described a metho- 
dology for Chance Discovery. A method for Chance Discovery might analyze 
‘success stories’, i.e., cases where features of high impact for the future were 
successfully identified and accordingly promoted by human initiative. This 
retrospective analysis might be framed and processed by means of KeyGraph 
[58.8], a smart indexing method originally developed for information retrieval. 

Recently, McBurney and Parsons [58.6] proposed principled methods to 
discover chances based on dialogue games. In the context of e-commerce 
systems, Stolze and Strobel [58.14] investigate interviews with buyers in order 
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to identify their (implicit) needs. We believe that the theoretically founded 
methods will have the greatest impact on the field of Chance Discovery. 

In this paper, we mainly focussed on the epistemological aspect of chance 
discovery. However, the discovery of potential opportunities and risks seems 
to be intimately connected to questions about human values, what should be 
the case and what should not be the case. Obviously, there are no opportu- 
nities or risks per se, they are only given with respect to certain values and 
associated goals of humans. To give drastic example, the detection of a future 
earthquake is not only a high risk for people living in a particular region, it is 
also an opportunity for certain organizations to take advantage of the chaos 
following the earthquake. 
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Discovering new topics which cover new items, problems, and ideas (e.g., mo- 
bile phone, global warming, human genome project, etc) is truly profitable, 
important, and interesting for us. For instance, 1. Companies producing ’mo- 
bile phones’ have made large profits by the great sales, 2. The awareness of 
’global warming’ has improved the environment of the earth by regulating ex- 
haust emissions, 3. Fatal illnesses might be conquered by the human genome 
project. However, since we cannot completely decode the world surrounding 
us, we cannot know the topics and their mechanisms in advance. Considering 
this situation, these phenomena could be a big chance for our activities. In 
this paper, we describe our approach for discovering the future directions of 
communities on the web to detect chances. 



59.1 Introduction 

Often, a new topic suddenly becomes popular although it seems insignificant 
at first sight. The Tipping Point describes this kind of phenomenon where 
a ’little’ thing can make a big difference [59.1]. We are deeply confused by 
changes that happen suddenly. However, since we cannot completely decode 
the world surrounding us, we cannot know the chances and their mechanisms 
in advance. Considering this situation, the Tipping Point could be a big 
chance for our activities. We understand ’topics’ in the broad sense that 
cover new items, problems, ideas, and so on. Below, we show you some recent 
examples of new topics: 

Mobile Phone: Considering the context of the appearance of mobile phones, 
there were essentially two factors. First, mobile phones conquered the 
inconvenience of beepers that people had to find a public phone when 
a beeper rang. Second, mobile phones were equipped with the functions 
of the Internet and E-mail services. Due to the synergy effects of these 
factors satisfying our needs, mobile phones began to get popular. 

Global Warming: The awareness of global warming realized the collaboration 
of automobile and environmental preservation communities, and conse- 
quently brought about hybrid automobiles which have minimal exhaust 
emissions for preserving the environment of the earth. 
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Human Genome Project: Many researchers in the field of artificial intelli- 
gence, biology, and medical science are collaborating on the human ge- 
nome project to analyze the human genome and to reveal its effects. The 
human genome project is getting into the limelight because we expect 
the conquest of fatal illnesses. 

In many cases, these topics were born when new collaborations of existing 
topics satisfy our potential needs or demands. Although the hidden factors 
might only be ’submerged’ in the human mind, we believe that a few signs 
can be mined from a database reflecting human’s thought. For this purpose, 
the web is an attractive information source for its sheer size and sensitivity 
to trends. The web consists of an abundance of communities[59.2, 59.4], each 
corresponding to a cluster of web pages sharing common interest. However, 
the communities are not independent but are related with each other in 
varying degrees. From this point of view, we are expecting the relations of 
communities might show the future directions of communities, and suggest 
the potential needs or demands. 

In this paper, we describe our approach for discovering the future direc- 
tions of communities by exploring the link structure of the web. We have 
implemented a prototype system named ChanceFinder that visualizes the 
future directions of communities and ranks promising web pages and links. 
Empirically, ChanceFinder showed some interesting directions for some to- 
pics. 

The rest of this paper is organized as follows. In Section 59.2, we introduce 
related researches, and the process of ChanceFinder is described in Section 
59.3. The experiments are discussed in Section 59.4, and finally we conclude 
this paper in Section 59.5. 



59.2 Related Researches 

Our research consists of two parts: the discovery of communities, and the 
discovery of relations among these communities. In this section, we introduce 
researches related to these two processes. 

59.2.1 Discovery of Communities 

A community on the Web is defined as a cluster of web pages which share 
common topics. However, there are many ways to detect the clusters. 

For example, Broder et al.[59.2] reported on an algorithm of clustering 
web pages based on the contents. This approach can be applied not only 
to hyper-text(e.g., web pages) but also plain-text. However, indexing web 
pages accurately is difficult because the contents of web pages are not always 
meaningful. 
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In contrast to the content-based approach, links in web pages can be 
reliable information because they reflect human judgment. Botafogo and 
Shneiderman[59.3] proposed an idea for abstraction called aggregate based 
on graph theory. Their algorithm removes ’indies’ (nodes with high number 
of out-links) and ’references’(nodes with high number of in-links) iteratively 
to clear the graph. However, removed nodes often become very important ele- 
ments to understand the web. On the other hand, Kumar et al.[59.4] defined 
a community on the web as a dense directed bipartite subgraph, and disco- 
vered over 100,000 communities. However, the scale of subgraphs depends on 
its parameters. This implies the difficulty in detecting communities from the 
web since the communities are often somewhat related with each other. We 
think the relations show the future directions of these communities. 

As another use of links, Kleinberg[59.5] and Brin and Page[59.6] used the 
link structures for ranking web pages. Their main idea was based on mutually 
reinforcing that the more a web page is referred, the more authoritative the 
web page becomes, and the more authoritative a web page becomes, the 
higher the web page ranks. The highly ranked web pages tend to be the 
representative web pages of communities. 



59.2.2 Discovery of Future Directions 

In the broad sense, future directions refer to meaningful relations among 
communities in various scenes. Focusing on WWW, Matsumura et al.[59.8] 
discovered promising new topics on the web by visualizing new combinations 
of communities sharing common topics. Ohsawa et al. [59.11] proposed Key- 
Graph, which is an algorithm for extracting assertions based on co-occurrence 
graph of terms from textual data. KeyGraph visualizes the relations between 
assertions and foundations to help us understand potential needs or demands. 
Accordingly, KeyGraph can be applied to show the future directions of tex- 
tual data. 

As for the human relations in communities, Kautz et al.[59.9] created 
REFERRAL WEB, a social network graph designed to find an expert who 
is both reliable and likely to respond to the user. Also, Foner et al.[59.10] 
described a matchmaker system named Yenta for finding people with similar 
interests and introduce them to each other. Both systems reveal the potential 
relations between individuals, therefore, they show the future directions of 
individuals. Maarek et al. [59.12] embodied WebGutter which outputs a tai- 
lored map of the web according to the user-specified interests. The map is 
one of the suggestion of the future directions of the user because it shows 
essentially related web pages. 
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59.3 Future Directions of Communities 

For the discovery of new topics on the web, we aim to discover the future 
directions of communities and to understand the potential needs or demands. 
In this section, we first represent the overview of our idea, and then describe 
our approach in detail. 



59.3.1 How to Discover the Future Directions? 

Our approach for discovering the future directions is based on link analysis 
because links can be more reliable information than terms (see 59.2.1). The 
outline of our process consists of five phases as follows: 

Phase 1. Collect web pages. 

Phase2. Discover communities on the web. 

Phases. Discover established relations among the communities. 

Phased. Discover future directions among the communities. 

Phases. Visualize the future directions. 

The accurate definition of a community on the web is an essential problem 
by itself. In Phasel, following Kumar’s definition [59.4], we expediently define 
a simple bipartite graph as a community where a community consists of a 
much cited web page and its surrounding web pages. Next, we focus on the 
property of the web that communities are often somewhat related with each 
other because a web page often belongs to some communities. In our view, 
the relations may include established(well-known) relations as well as the 
future directions of these communities. The degree of relation among two 
communities can be measured by the number of web pages included in both 
the communities. This idea is based on the co-citation concept originated in 




Fig. 59.1. An overview of the web. Each cluster corresponds to a community 
sharing common interest. Communities are often share the same interest with each 
other. Here, solid lines mean established relations and dotted lines show future 
directions. 
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the bibliometrics[59.7]. In this way, we regard strong relations as established 
relations in Phase2, and weak relations as the future directions in PhaseS. 
Our idea is graphically shown in Fig. 59.1. Considering the fact that an 
established link arises only when a future direction grows, focusing on future 
links is useful for understanding where the changes happen. 



59.3.2 The Detailed Process 

Here, we describe our approach sketched in 59.3.1 in detail. 

Phasel. Preparations: First of all, let a user decide a target area/topic which 
s/he want to explore the future directions. Then, source web pages D 
are collected by using Google Here, the first 500 web pages of Google’s 
output for the query are downloaded. 

Phase2. Discover Communities: For surveying the picture of communities by 
discovering the future directions among communities, we make use of 
only centered web pages in communities instead of all the web pages. 
The centered web page named as core- page is extracted as follows. 

1. Count the frequency of links included in D. 

2. Regard the top Ni links C as the ’core-pages’ of communities. 
Phases. Discover Established Relations: Measure the relations among core- 
pages by counting the number of co-citations, and regard strong relations 
as established links. The process is as follows. 

1. For every pair of two core-pages in C, count the number of links in- 

cluded in both the core-pages. 

2. Regard the top N 2 pairs as established links Ti (solid lines in Fig. 

59.1). 

Phased. Discover Future Directions: Measure the relations among core-pages 
by counting the number of co-citations, and regard weak relations as 
future links. The process is as follows. 

1. For every pair of two cores in C except for L\, count the number of 

links included in both the cores. 

2. Regard the top pairs as future links (dotted lines in Fig. 59.1). 
The movement of communities are shown by established and future re- 
lations. Therefore, future directions are expressed by the combination of 
these two kinds of relations. 

Phase5. Visualization: Core-pages and its relations(C, Li, and L 2 ) are visua- 
lized into 2-dimensional interface to piece out the connections of commu- 
nities and to understand the potential needs or demands. 

^ Google is a search engine to which Brin and Page’s algorithm [59. 6] is applied. 
Google is available at http://www.google.com/. 
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59.4 Experiments and Discussions 

We have implemented a prototype system named ChanceFinder on a Sun 
Enterprise450 with perl5 and Perl/Tk. ChanceFinder visualizes future di- 
rections. In this section, we show three experiments of ChanceFinder with 
Ni = 30, N 2 = 29, and N 3 = 10, and discuss them (These experiments were 
done on 17th of January in 2001). 



59.4.1 Future Directions of Portal Sites 

The output of ChanceFinder for input query ’Portal Site’ is shown in Fig. 
59.2. Fach node stands for a community, and especially each white node 
represents a core with many future links. Strong relations of communities are 
expressed by thick lines(established links), and promising future direction of 
communities are shown by thin lines(future links). The URL below each node 
shows the core of each community. Considering the near future, future links 
might change into established links or disappear. In either event, we should 
focus on only future links to predict the future. That is to say, the output 
shows the present and future map of communities. 

We can perceive three clusters in Fig. 59. 2. The lower right-hand cluster is 
constructed by 4 major portal sites: ’Yahoo!’, ’Infoseek’, ’Fxcite’, and ’Lycos’. 
The cluster is considered to be matured since every node links to each other by 
established links, and this assumption actually matches well accepted norms. 




Fig. 59.2. An output of ChanceFinder for input query ’Portal Site’. 
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All the communities in the lower left-hand cluster are strongly related to 
’Bfound.co.uk’ which is a company conducting web design, internet solutions, 
and e-commerce. This cluster seems to be a community in early development. 

The upper middle cluster consists of web pages belonging to ’internet.com’ 
communities. According to the lOOhot.com^ which is the Web’s leading ran- 
ked directory where the rankings are based on the Internet habits of more 
than 100,000 Web surfers each month, internet.com got 77th in the same date 
as the experiment. This means that ’internet.com’ is not a major portal site 
at present. However, we can see that the cluster is in energetic development 
because the cluster is composed of 13 communities, 17 established links, and 
8 future links. 

59.4.2 Future Directions of Book Site 

From the output of ChanceFinder for input query ’Book Site’ shown in 
Fig. 59. 3, we can easily recognize one big cluster and two tiny clusters. 




Fig. 59.3. An output for ’Book Site’. 



The upper-middle cluster is composed of two ’bookwire.com’ sites and one 
’abebooks.com’ site. The former is the book industry’s most comprehensive 
and thorough online information source, and the latter is a the world’s largest 
source of out-print books. That is, this cluster shows information sources of 
books. 

^ http://www.100hot.com 
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The upper-right cluster includes two communities of ’mcgraw-hill.com’ 
sites. These sites looks like tiny cluster at first sight, but these are the web 
page of McGraw-Hill company which is a time-honored publisher founded in 
1909. Therefore, this cluster means a well established community. 

The largest cluster comprises 14 About.com communities. The cluster 
seems to be already connected densely since it has 25 established links, and 
11 future links. In fact, according to the survey on ’Portals leapfrog up Me- 
dia Metrix chart of the Web’s top sites’ in December 1999, About.com is 
described as follows 

Excite@Home Corp., NBC Internet Inc. and About.com Inc. are on 
the rise, according to the latest traffic numbers from Internet mea- 
surement firm Media Metrix Inc. 

However, About.com seems to be a minor web site in the area of ’Book Site’ 
yet (About.com does not appear in the rankings of 100hot.com). For these 
reasons, About.com is considered to be struggling to expand the influences, 
and this consideration can be read from Fig. 59.3. 

Interestingly, ’amazon.com’, the most famous and giant book site exists 
alone in the middle-right in Fig. 59.3. This implies that almost all the com- 
munities rival each other, and Fig. 59.3 clearly shows this situation. 



59.4.3 Future Directions of Artificial Intelligence 

The output for input query ’Artificial Intelligence’ is shown in Fig. 59.4. Vie- 
wing Fig. 59.4, we can recognize only one big chunk of communities where 20 
communities, 28 established links, and 11 future links are densely connected. 
Fig. 59.4 is essentially different from above two examples in the point that 
the cluster in Fig. 59.4 consists of different communities. This may show the 
maturity of the area of ’Artificial Intelligence’. If this assumption is true, 
we must seek a new area which collaborates with ’Artificial Intelligence’ to 
create future directions. 



59.5 Conclusions 

In this paper, we first insist on the importance of discovering new topics. 
Then, we describe the idea of discovering future directions of communities by 
chaining primitive communities to understand potential needs or demands. 
Through some experiments and their evaluations, we show that ChanceFin- 
der certainly shows the relations of communities. However, we expect that 
whether the relations really become the future directions depends on the 
user’s vision or imagination based on accurate information. 

® http:/ /www. zdnet.com/zdnn/stories/news/0, 4586, 2424687, 00. html 
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Fig. 59.4. An output for ’Artificial Intelligence’. 
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A document is represented by a network; the nodes represent terms, and 
the edges represent the co-occurrence of terms. This paper shows that the 
network has the characteristics of being small world, i.e., highly clustered and 
short path length. Based on the topology, we can extract important terms, 
even if they are rare, by measuring their contribution to the graph being 
small world. 



60.1 Introduction 

Graphs that occur in many biological, social and man-made systems are often 
neither completely regular nor completely random, but have instead a “small 
world” topology in which nodes are highly clustered yet the path length 
between them is small [60.7, 60.5]. Watts and Strogatz have shown that a 
social graph (the collaboration graph of actors in feature films), a biological 
graph (the neural network of the nematode worm C. elegans), and a man- 
made graph (the electrical power grid of the western United States) all have 
a small world topology [60.7, 60.6]. World Wide Web also forms a small world 
network [60.1]. 

In this paper, we first show the graph derived from a document has the 
small world characteristics. Then we develop a new algorithm to find impor- 
tant terms by measuring a term’s contribution to make the world small. 



60.2 Small World 

We treat an undirected, unweighted, simple, sparse and connected graph. (We 
expand to an unconnected graph in Section 60.4.) To formalize the notion of 
a small world, Watts and Strogatz define the clustering coefficient and the 
characteristic path length [60.7, 60.6]: 

— The characteristic path length, L, is the path length averaged over all pairs 
of nodes. The path length d{i,j) is the number of edges in the shortest 
path between nodes i and j. 

— The clustering coefficient is a measure of the cliqueness of the local neigh- 
bourhoods. For a node with k neighbours, then at most kC 2 = k{k — l)/2 
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edges can exist between them. The clustering of a node is the fraction of 
these allowable edges that occur. The clustering coefficient, C is the average 
clustering over all the nodes in the graph. 

Watts and Strogatz define a small world graph as one in which L > Lrand 
(or L « Lrand) and C ^ Crand where Lrand and Crand are the characteri- 
stic path length and clustering coefficient of a random graph with the same 
number of nodes and edges. They propose several models of graphs, one of 
which is called /3-Graphs. Starting from a regular graph, they introduce di- 
sorder into the graph by randomly rewiring each edge with probability p as 
shown in Fig. 60.1. If p = 0 then the graph is completely regular and ordered. 
If p = 1 then the graph is completely random and disordered. Intermediate 
values of p give graphs that are neither completely regular nor completely 
disordered. They are small worlds. 



Regular 



Small world 




Random 



p=0 



Increasing randomness 



p=l 



Fig. 60.1. Random rewiring of a regular ring lattice. 





Walsh defines the proximity ratio p = {C j L) / {Cmnd/ Lrand) as the small- 
worldliness of the graph [60.5]. p is larger than 1 in the graphs with a small 
world topology. 



60.3 Term Co-occurrence Graph 

A graph is constructed from a document as follows. We first preprocess the 
document by stemming and removing Salton’s stop words. We apply n-gram 
to count phrase frequency. Then we regard the title of the document, each 
section title and each caption of figures and tables as a sentence, and exclude 
all the figures, tables, and references. We get a list of sentences, each of which 
consists of words (or phrases). 

Then we pick up frequent terms which appear over a user-given threshold, 
fthre times, and fix them as nodes. For every pair of terms, we count the eo- 
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occurrence for every sentences, and add an edge if the Jaccard coefficient 
exceeds a threshold, Jthre^- 

Table 60.1 is statistics of the small- worldliness of 57 graphs, each con- 
structed from a technical paper that appeared at the 9th international World 
Wide Web conference (WWW9) [60.8]. From this result, we can conjecture 
these papers certainly have small world structures. However, depending on 
the paper, the small- worldliness varies. 



Table 60.1. Statistical data on proximity ratios /r for 57 graphs of papers in 
WWW9. 





L 


Lrand 


C 


Crand 




Max. 


4.99 


3.58 


0.38 


0.012 


22.81 


Ave. 


5.36 


— 


0.33 


— 


15.31 


Min. 


8.13 


2.94 


0.31 


0.027 


4.20 



We set fthm = 3. We restrict attention to the giant connected component of the 
graph, which include 89% of the nodes on average. We exclude three papers, where 
the giant connected component covers less than 50% of the nodes. We don’t show 
the Lrand and Grand for the average case, because n and k differs dependent on the 
target paper. On average, n = 275 and k = 5.04. 



One reason why the paper has a small world structure can be conside- 
red that the author may mention some concepts step by step (making the 
clustering of related terms), and then try to merge the concepts and build 
up new ideas (making a ‘shortcut’ of clusters). The author will keep in mind 
that the new idea is steadily connected to the fundamental concepts, but not 
redundantly. 



60.4 Finding Important Terms 



Admitting that a document is a small world, how does it benefit us? We try 
here to estimate the importance of a term, and pick up important terms, 
though they are rare in the document, based on the small world structure. 
We consider ‘important terms’ as the terms which reflect the main topic, the 
author’s idea, and the fundamental concepts of the document. 

Below we show a series of definitions to measure the importance of a term. 

Definition 60.4.1. An extended path length d'{i,j) of node i and j is defi- 
ned as follows. 



J d{i,j), if{i,j) are connected, 
\wsum, otherwise. 



(60.1) 



^ In this paper, we set Jthre so that the number of neighbors, k, is around 4.5 on 
average. The Jaccard coefficient is simply the number of sentences that contain 
both terms divided by the number of sentences that contain either terms. This 
idea is also used in constructing a referral network from WWW pages [60.2]. 
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where Waum is a constant, the sum of the widths of all the disconnected 
subgraphs. d{i,j) is a path length of the shortest path between i and j in a 
connected graph. 

Definition 60.4.2. Extended characteristic path length L' is an extended 
path length averaged over all pairs of nodes. 

Definition 60.4.3. is an extended path length averaged over all pairs of 
nodes except node v. L'q^ is the extended characteristic path length of the 
graph without node v. 

Definition 60.4.4. The contribution, CBy, of the node v to make the world 
small is defined as CBy = L'q^ — L'y. 

If node v with large CBy is absent in the graph, the graph gets very large. 
In the context of documents, the topics are divided. We assume such a term 
help merge the structure of the document, thus important. 



60.5 Example 

We show the example experimented on [60.4], i.e. the longer version of this pa- 
per. Table 60.2 shows the frequent terms and Table 60.3 shows the important 
terms measured by CBy. Comparing two tables, the list of important terms 
includes the author’s idea, e.g., important term and contribution, as well as 
the important basic concept, e.g., cluster and coefficient, although they are 
rare terms. However the list of frequent terms simply show the components 
of the papers, and are not of interest. 



Table 60.2. Frequent terms. Table 60.3. Terms with 10 largest 



CBy. 



Term 


CBy 


Frequency 


small 


3.05 


37 


term 


2.80 


34 


important term 


1.93 


7 


contribution 


1.64 


6 


node 


1.00 


29 


make 


0.82 


6 


cluster 


0.57 


15 


graph 


0.54 


39 


coefficient 


0.52 


8 


average 


0.50 


8 



Term 


Frequency 


graph 


39 


small 


37 


world 


37 


term 


34 


small world 


30 


node 


29 


paper 


21 


length 


21 


document 


19 


edge 


19 



Lastly, Fig. 60.2 shows the graphical visualization of the world of this 
paper. (Only the giant connected component of the graph is shown, though 
other parts of the graph is also used for calculation.) We can easily point out 
the terms without which the world will be separated, say small and important 
term. 
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Fig. 60.2. Small world of the paper. 



60.6 Conclusion 

We expect our approach is effective not only to document indexing, but also to 
other graphical representations. To find out structurally important parts may 
bring us deeper understandings of the graph, new perspectives, and chances 
to utilize it. A change, which makes the world very small, may sometimes be 
very important. 
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Social activities are divided into two types. One is a creative activity by the 
combination of the existing object, and another is an imitative activity by 
which the created matters settles. Since a creation is realized by the com- 
bination of existing knowledge and information, people cannot create new 
things without thinking about previous works and their proper combinati- 
ons. This paper proposes a framework for supporting a creative activity by 
combinations. 



61.1 Introduction 

Social activities are divided into two types. One is a creative activity by the 
combination of the existing object, and another is an imitative activity by 
which the created matters settles. As human begins have a nature of ’’Tire”, 
people always seek novel things. Therefore, it can be said that the society 
is kept by mutually creations. Generally, since a creation is realized by the 
combination of existing knowledge and information, people cannot create new 
things without thinking about previous works and their proper combinati- 
ons. The framework described in this paper supports discovery of unknown 
relations and combinations of information concealed in WWW database. By 
using this framework, one will be able to find a new theme of study or will 
be able to create a hot-selling product. 



61.2 Framework for Creative Activity 

A framework for creative activities is shown as the figure 61.1. This framework 
consists of User, Search System, Data Mining System and Interface for supp- 
lying knowledge. A search system in the framework needs search keywords as 
user’s input, and outputs arranged relational keywords in a two-dimensional 
interface after a Data mining processes for extracting useful information. 
Users make his/her own interest concrete, acquire relational information and 
do creative activities with a discovery of unknown relationships by the repeti- 
tion of a search and a supply of keywords. Relational keywords are extracted 
from Web pages matched with user’s each search keyword. In the interface, 
the system supplies not only keywords but also some additional information 
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Fig. 61.1. A System for Creative Activities 



such as a summary of output Web pages. In the rest of this section, each of 
them is defined and explained. 

61.2.1 User Discovers a Viewpoint of the Combination 

In the current world, the information in WWW is unknown for a user but 
known for the author of each Web page. Another words, a new idea comes 
from a combination of known ideas. A brand new combination may has a 
brand new viewpoint. However, this number of combination will be so enor- 
mous that a person cannot match objects by hand. Therefore, the system 
which provides users with keywords as chances that can be viewpoints is 
proposed. 

As for this viewpoint, an agreement or a disagreement will be needed for 
two which are combined. This is realized by words that determines those 
two are agreement or disagreement. In short words, to discover this view- 
point is a discovery for creation. This may be a kind of discovery by co- 
occurrance [Langley 87]. For example, two companies will be merged by a 
viewpoint that a company must survive or must pursue profits. Therefore, 
this framework aims at constructing a new system which aids users to discover 
new combinations by suggestions of viewpoints representing an agreement or 
a disagreement. However, nothing will come without inputs of user’s mind or 
interests. So users use Internet search systems with latent and vague disires 
in their mind. 

61.2.2 Support System for Search Systems 

Most users of the Internet want to acquire information what the users didn’t 
know. However, the users are hard to search by proper search keywords in 
the domain the users don’t know well. Then, a system which aids search 
processes by interactions between a user and the system is useful. Namely, 
such a system will: 
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1. Make a user interest concrete. 

2. Aid a user to acquire relational information. 

3. Aid a user to discover unknown relationships among search keywords. 

In this paper, along with points 1. and 2., the point 3. is the most notable. 
Another words, this system aids users to develop a new goods or to find a 
new breakthrough of their study by inputs of user’s original combination of 
search keywords. 

Three creative purposes of using search engines, as in the figure 61.1, 
are investigation, verification and extension. As these keywords are already 
described in 61.1, Investigation and Verification aim at focusing information, 
and Extension aims at extending information related to a search keyword. 
Therefore, a support system for search engine is surely not a search engine 
but is a contrivance to support above purposes. The contrivance is realized 
by Data Mining and an interface for displaying knowledge. 

61.2.3 Data Mining from Web Pages 

Data Mining[Fayyad 96] is to seek useful rules and knowledge from enormous 
data wear house. Some of them are derived from association rules [Agrawal 94] 
and conditional probabilities defined by the co-occurrance of data. In the 
Data Mining module of the framework, some relational keywords of search 
keywords are extracted from current Web pages, and relational, summarized 
and some kind of useful knowledge are expected to be supplied. The features 
of Web database are as follows; 

1. Enormous:Not all data can be in use. 

2. Dynamic:The data is always changing. 

3. Heterogeneous: A data includes some topics and viewpoints. 

One of the methods to cope with these features is needed to constract a 
system. The most important point is how to restrict input data, such as 
pages in specific domains, pages retrieved by a keyword (including a specific 
keyword), its freshness and so on. Along with these, the same things are 
applicable to a single Web page. That is, a constructor must think how to 
divide and how to interpret a page to extract essences each user disires. 

61.2.4 Interface for Knowledge Refinement 

It is important for a user to understand relationships between search keywords 
and existing Web information. A two-dimensional search interface is needed 
to know tendencies of Web information, for making a concrete search condi- 
tion and for getting an idea of a new topic. Therefore, relational keywords, as 
outputs of this framework, are arranged neatly in a two-dimensional search 
interface. Practically, though keywords are ultimate summarized information 
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of Web pages, those are fragments of sentences. Some users may want a sum- 
marized sentence that is chained by keywords, because a word has various 
meanings. Therefore, some complementary information will also be useful. 
The interface for interactive discovery needs some components which make 
up each loss occured by the restriction of the data. Finally, this framework 
supplies an interface with which graphical user interface is covered. 



61.3 Experimental System 

Currently, though the prototype system is under construction, each module 
have already worked separately [Sunayama 99, Sunayama 00, Sunayama 01]. 

Now, the Data Mining methods to extract relational keywords is to sel- 
ect keywords commonly appeared in Web pages including a search keyword 
[Sunayama 99] . However, this is not a complete method because a word has 
various meanings, and because a word may not be used suitable for a user 
interest. A keyword in a Web page should be extracted as topical keyword of 
the page and of the user. 

Panoramic View System [Sunayama 01] extracts topic keywords which de- 
pends on keywords user given. Therefore, these two methods for acquiring 
relational keywords will be combined in the new system. We’d likd to have 
some additional information for grasping information on the Web. Panoramic 
View System can acquire key sentences if Web pages. As a result, users can 
choose information quickly and can get more information. 

A two-dimensional search interface have already appeared in [Sunayama 
00]. Relational keywords were arranged in two-dimensional interface, and 
users could make out the relationship between search keywords and relational 
keywords easily. In the figure61.2, search query is “(CM OR Movie) AND 
Film AND Popular AND (Ryoko OR Hirosue), and relational keywords are 
arranged. Search keywords are clustered by common relational keywords, 
so two keywords “(CM OR Movie)” and “Film” are clustered in the same 
category. Some keywords arranging in the interface will be hard to explain 
why those keywords are output as relational keywords. Such keywords will 
be treated as unknown or new common points of search keywords. Then, the 
user will examine the details of the keywords why the keywords have output. 
In general, the keywords become a start point of examining process to find a 
new research topic and to find a strategy of administration. 

This cycle of search and acquiring information will give birth to new 
ideas. This system supplies users with chances for knowing trends of the 
world because a user will be in the state that the user only knows unknown 
viewpoints of the combination. 
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Fig. 61.2. Search In- 
terface for Interactions 



61.4 Conclusion 

This paper proposed a framework for creative activities. Creative activities 
are necessary for our usual life and the affluent society. Especially, it is effec- 
tive for people to imply new viewpoints of the combination which has never 
been thought out. Certain symbolic words are useful for users to concrete 
one’s idea, and to know relational knowledge and to expand their ideas. May 
this new century will be a creative century! 
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We propose a novel method and its implementation to support long-term 
idea-generation in everyday life. Our system consists of two components: 
a management system for problems and ideas named IdeaManager, and a 
personal information storage system named iBox. Considering input of in- 
formation in iBox as a clue event for idea-generation, iBox searches related 
problems and ideas in IdeaManager and presents the result if any. Its aim 
is to support non-intentional idea-generation. In a long-term user study, we 
confirm the feasibility of our approach. 



62.1 Introduction 

Since the end of 1980’s, a number of systems to support idea-generation, cal- 
led creativity support systems, have been proposed [62.4]. However, most of 
them have not gained widespread use. The authors believe this is because 
they support isolated aspects separated from professionals’ daily activities 
[62.2]. They only support short-term thinking in front of their systems. They 
assume that users use them while consciously generating ideas. However, con- 
sidering our experiences and prior cases of idea-generation, it is obvious that a 
person needs to think for a long time. It is rarely necessary to generate ideas 
immediately and most problems or themes allow sufficient time to acquire 
satisfactory ideas. Moreover, there are more cases of sudden idea-generation 
at times when a person does not try to generate ideas consciously than the 
cases of idea-generation generated in intentional thinking [62.5]. 

Based on these claims, we have been pursuing a system to support long- 
term creative thinking in everyday life. Our everyday life is filled with stimuli. 
They might work as clues to generate ideas. They are chances to generate 
ideas. Based on the thought, ”A chance is not what is given. A chance is what 
we should get by ourselves”, we propose a system that amplifies chances to 
generate ideas. 

To begin with, we propose our system. Its characteristic is to cooperate 
with a personal information storage system used in daily life. It supports 
activities where information management plays an important role and original 
ideas are needed, for example by researchers or planners. Next, we explain 
the results of long-term user study of the system. 
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62.2 System Overview 

Based on the observation of actual idea-generation, we have built a system 
to support non-intentional idea-generation in long-term daily activities (refer 
to [62.5] in details). Our system consists of two components: a management 
system for problems and ideas named IdeaManager (Figure 62.1), and a per- 
sonal information storage system named iBox (Figure 62.2). They run on 
Windows 95/98/NT 4.0/2000 and are implemented using the search engine 
of Albase^ [62.3]. 
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Fig. 62.1. A screen shot of IdeaManager 



IdeaManager. In long-term idea-generation, a person tries to seek ideas 
and refine them many times until he or she acquires satisfactory ones. Here, 
for the next trial of idea-generation, he or she must recall the problem. In 
order to avoid forgetting problems and their ideas, IdeaManager supports the 
retention and management of them. 

All information in the IdeaManager has its name and keywords. In the 
current version, only text can be stocked. Information stocked in IdeaManager 
is divided into following three types: problems, ideas, and related information. 
Information is stocked in a corresponding window. Users can view problems, 
ideas, and related information, side by side. Also, using link function, users 
can manage problems with its corresponding ideas and related information. 

IdeaManager provides the following basic search functions: search by 
keywords, full text search, search by date, and list of all information. Also, 

^ Now, Albase became an item for sale of Fuji Xerox Co., Ltd. as Johobako 4.0. 
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Fig. 62.2. A screen shot of iBox 




users can filter information using attributes of information, for example, de- 
adlines, importance, and so on. Each of these functions returns a list of names. 
Selecting a name of the list, users can see the information with its name and 
keywords. 

iBox. iBox is a personal information storage system used in various types of 
situations in both work and other everyday life. Actually, in our laboratory, 
almost all students use iBox in their actual daily activities. 

Similarly to IdeaManager, iBox stocks all text information with its name 
and keywords. iBox provides the same basic search functions as IdeaManager. 
Each of these functions returns a list of names. Selecting a name of the list, 
a user can see the information with its name and keywords. 

Cooperation between IdeaManager and iBox. Our system provides 
two types of cooperation. A registration of information in one application 
triggers a search of information in another application and presents the results 
(Figure 62.3). 

Information stocked in iBox reflects a users’ interest. Such information 
may have something to do with users’ current problems and work as a hint 
for consideration of these problems. When information is registered in iBox, 
iBox searches related problems and ideas of IdeaManager and pops up the 
results if any. We hope that users will be able to generate or enhance ideas for 
searched problems or ideas using registered information as a hint . Information 
stocked in iBox must have novelty or emergence which Finke et al. [62.1] call 
’preinventive properties’. A trial of idea-generation at this timing leads to a 
function-follows-form approach of idea-generation described by Finke et al. 

Also, when users recognize a problem, presenting related information 
might stimulate users’ consideration. Then, when a problem or an idea is 
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registered in IdeaManager, IdeaManager searches related information in iBox 
and pops up the results if any. We hope that users will be able to generate 
or enhance ideas for registered problems or ideas using searched information 
as a hint. 




62.3 Long-Term User Study 

We carried out a long-term user study. The users were two researchers and one 
of them is the author of this paper. The period of study was more than seven 
months (221 days). During this period, the users always had notebook PCs 
with them and managed problems and ideas in their actual activities. Both 
IdeaManager and iBox saved action logs, for example registration, search, 
reference of information, and so on. The action logs of one of the users were 
then analyzed. 



62.3.1 Behavior Analysis on Pop-Up 

During the experimental period, the user registered 1,242 pieces of informa- 
tion in iBox. During this period, iBox popped up IdeaManager 70 times and 
in 46 times of these pop-ups the user referred to at least one piece of infor- 
mation. He also registered 25 problems and 49 ideas. IdeaManager popped 
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up iBox more than 9 times^ and in 4 of these pop-ups he referred to at least 
one piece of information. We analyzed his behavior during these 50 pop-ups 
referring to at least one piece of information. This analysis is based on re- 
trospective verbal protocol obtained while reviewing behavior using action 
logs and his diary. We show some typical behavior that pop-ups effectively 
worked in the following. 

Example 1. The following is the first example for a pop-up driven by iBox. 
(1) The user registered information named ’’Spacing effect of study” (in Ja- 
panese) in iBox. (2) Next, IdeaManager popped up and one problem and one 
idea were searched. (3) He referred to a problem named ”How to search in 
a pop-up” (in Japanese) in IdeaManager. (4) He searched two ideas linked 
to the above problem of step 3. (5) He referred to the ideas of step 4. (6) 
He registered an idea named ’’IdeaManager has a effect of spacing effect” (in 
Japanese) in IdeaManager. (7) He linked the problem of step 3 to the idea of 
step 6. 

In this procedure, the user generated an idea for a presented problem. 
He reported that he had realized the pop-up effectively stimulated the new 
idea. He also reported that he had tried to generate ideas to combine the 
information of iBox and the searched problem intentionally. We think that 
pop-ups driven by iBox could support intentional combination of information 
and problems. 

Example 2. The following is the second example for a pop-up driven by 
IdeaManager. (I) The user registered an idea named ’’Implementation of se- 
arch by ID in iBox” (in Japanese) in IdeaManager. (2) Next, iBox popped 
up and 9 pieces of information were searched. (3) He registered a problem 
named ’’Desired functions for search engine of Albase” (in Japanese) in Ide- 
aManager. (4) He referred to some information on API of Albase in iBox. 
(5) He registered an idea named ”How to use the API of search engine of 
Albase” (in Japanese) in IdeaManager. (6) He linked the problem of step 3 
to the idea of step 5. 

In this procedure, the user generated an idea referring to information 
presented by iBox. He reported that he had realized the idea of step 1 could 
adapt other APIs of Albase and his desired function could be substituted 
for a combination of other APIs. He also reported that the idea was a bit of 
an improvement and he could probably generate the same idea without the 
pop-up function. However, in this case, pop-up function afforded him to seek 
ideas then and there at the time. Without the pop-up function, he might 
not have tried to seek ideas for the registered problem and leave it as is. We 
think pop-ups driven by IdeaManager could support spontaneous thought 
when users recognize a problem. 

^ There was a bug in the log. In fact, there were more pop-ups driven by IdeaM- 
anager. 
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62.3.2 Effects and Open Problems 

In this user study, we cannot necessarily discuss whether our system worked 
effectively or not. However, there were a number of indicators for long-term 
creativity support. We observed the following effects of our systems. 1) Users 
can feel relief because they can leave the management of problems and ideas 
to IdeaManager. 2) Pop-ups driven by iBox encourage users to think of in- 
tentional combination between a problem and information (Example 1). 3) 
Pop-ups driven by IdeaManager encourage users to not only register a pro- 
blem but also try to think and generate ideas when they recognize a problem 
(Example 2). 

As open problems, we observed the following design problems of our sy- 
stem. 1) It is often difficult to distinguish a problem from an idea. We need 
to modify the design of IdeaManager or to give a clear guide to distinguish 
a problem from an idea. 2) If there is a lot of information presented by a 
pop-up, users do not feel like viewing them. We need to control the amount 
of information in pop-ups. 3) The current pop-up algorithm of IdeaManager 
is a keyword matching. Sometimes, searched information has clear relativity 
with registered information and it does not stimulate users’ thinking. We 
need to enhance the search mechanism. 



62.4 Conclusions 

We presented our long-term creativity support system cooperating with a 
personal information storage system. The aim of this cooperation is to ena- 
ble interesting information acquired in daily life to work as a clue for idea- 
generation. In a long-term user study, we observed some evidence that pop- 
up functions worked effectively. We think that feasibility of our approach has 
been confirmed. 

Now, we are enhancing our system based on the result of this study and 
planning a formally designed experiment. 
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Analyzing conversation between customers and salesclerks in actual purchase 
activities, we have found that appropriate information given by the clerk in a 
timely context often caused a change in customer’s focus, which then led to 
her decision on what to buy. This mental leap phenomenon is similar to the 
one often observed in creative activities such as design or concept formation. 
We expect that the effect provided by the creativity support systems can be 
expected to plays a similar role in purchase consulting systems as well. The 
examples described in this paper can be thought as cases of chance discovery 
by skillful sellers as creative communicators. 



63.1 Introduction 

Conversation or communication is a part of information exchange functiona- 
lities of stores. When customers shop in the real world, their communication 
with salesclerks often inspires them to purchase or enables them to make a 
smooth decision. We could learn a lot from the salesclerks’ communication 
when they handle customers based on individual salesclerk’s knowledge and 
experience. 

Pu et al. pointed out that communication is important for formation 
and clarification of needs in purchase activities at e-commerce sites [63. 7]. 
Studying what communication is done in the actual purchase activities and 
how it influences the customers’ decision-making and sales promotion may be 
of importance in order to obtain hints for developing e-business in the future. 

Accordingly, our study collected and analyzed protocols from actual 
purchase activities. The result showed that providing appropriate informa- 
tion in a timely manner in decision-making process for shopping frequently 
causes customers’ focus to change resulting in their final decision. Such a 
mental leap phenomenon can also be seen in creative activities such as de- 
sign and concept formation. Therefore, we expect that the effect which can 
be obtained in creativity support systems plays a similar role in purchase 
consulting systems at e-commerce sites. 
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63.2 Collecting Protocols of Actual Purchase Activities 

We gathered 16 women in 20’s to 30’s who could continuously cooperate as 
examinees in order to collect protocols in actual purchase activities. When 
they went shopping with no companions, we had them carry a tape-recorder 
to record the conversation with salesclerks in the stores. 

A total of 107 protocol data were collected. However, 33 cases among 
them had hardly audible voices of salesclerks and could not be analyzed. 
As for another 23 cases, even judging together with reports, we could not 
determine what demonstrative words such as ’’this” and ’’that” indicate. 
Therefore, remaining 51 cases were analysed as protocol data. ^ 



63.3 Aualysis aud Result 

63.3.1 Expected Reaction 

Usually, the role played by salesclerks as advisors is primarily to help custo- 
mers discover possible choices by presenting appropriate choices and to help 
them consider and evaluate choices by providing information. We call this 
kind of salesclerks’ response expected reaction in this study. 

For example, protocol sample ^1 as shown below is one example of expec- 
ted reactions. Here, responding to the customer’s requirement that “because 
an item as a current choice (candidate A) is short, longer one is better”, the 
salesclerk presented another item (candidate B) matching the requirement. 
This can be taken as an expected reaction, where she affirmed the customer’s 
requirement and present more appropriate choices. Whether the customers 
buy the item given as “more appropriate choice (from a certain aspect)” or 
not depends on the evaluation of the item by them. Frequently, when another 
choice is presented, the decision-making process moves to the next cycle, and 
items will be evaluated from another aspect. In this example, the customer, 
who requested a “longer” item focusing on the length, was presented new 
choice matching her requirement, then turned her attention to the “shape.” 
Figure 1 shows this. 

Protocol sample #1: Expected reaction 
[Customer]: Do you have longer one with this shape? 

[Salesclerk]: Well, let me see this line... (After considering and searching 
for a while) this one (candidate B) could be freely adjusted to your size. 

This type is not hemmed from the beginning so that it can be adjusted to 
each customer’s length. This one would have no problem at all. It is long 
enough even for foreign models. 

[Customer]: I see. What its shape is like? 



^ All the conversation was in Japanese. The conversation shown in this paper is 
translated into English by the authors. 
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Provide another aspeet 



^ Mental leap 




Fig. 63.2. Thinking process in the protocol sample #2 



63.3.2 Unexpected Reaction 

Salesclerk’s reaction in actual purchase activities is not always an expected 
one. Affirmative reaction to what the customer says is contextually an expec- 
ted advice, however, some customers actually cannot conclude only with this 
type of advice. In such a case, opposing customer’s evaluation or mentioning 
unexpected aspects for the customer, on the contrary, may increase the chance 
of successful sales. The protocol sample #2 shown below is one example of 
such cases. 
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Protocol sample Unexpected reaction (the case of a suecessful mental 
leap) 

[Customer]: This (candidate A) is a little short. 

[Salesclerk]: Snch a design is popnlar this year. Almost every shop deals 
with short ones. Do yon prefer longer one? 

[Customer]: Too short to cover my waist... 

[Salesclerk]: It depends on the balance with your skirt or pants, ’cause 
you’re now wearing shorter tight skirt, yon think that way, but if wearing 
a long skirt, you will feel better. 

[Customer]: Really? Does it more suit to long skirt than pants? 

[Salesclerk]: It depends on the shape, but tight pants would emphasize 
your waist. So, long skirt with an elongated shape would be better. (After 
searching...) For example, this type (candidate B) would best suit to your 
current jacket. 

[Customer]: OK, can I give it (candidate B) a try? 

For example, in the protocol sample ^2, for the customer who was reluc- 
tant in the sense that “jacket (candidate A) is short in length,” the salesclerk 
provided another aspect of “the balance with lower clothes” and persuaded 
her to think “it’s not short” because “wearing it together with long skirt is 
OK.” The salesclerk recognized that the customer didn’t like the short one 
because “it emphasizes her body shape,” and came to the conclusion that 
the problem was not “clothes’ being short in length” but presenting “how to 
wear it not to emphasize body shape.” Figure 2 shows this. 

In this way, in some cases, seemingly disagreeing with the customer con- 
versely prompted him/her to make a decision. In the sense that opposing 
customer’s requirements or thoughts shown verbally, this can be taken as 
unexpected reaction by the salesclerk, which deviates from the usual flow of 
presenting solutions to match requirements. Furthermore, the unexpected re- 
action can be thought to bring the possibility of substantial change and leap 
in what the customer thinks. That is, an unexpected reaction by the seller 
in an appropriate situation may lead to a chance of the customer’s decision- 
making and the successful sale. In this sense, the unexpected reaction may 
be considered as an instance of chance discovery by the seller. 



63.3.3 Successful Chance Discovery with Unexpected Reaction 

Making an unexpected reaction may also increase the risk of failed conver- 
sation and unsuccessful sale because of disagreeing with the customer. Ho- 
wever, it can conversely bring the chance to prompt the customer to make 
a decision. Extracting from the protocols unexpected reactions effective in 
decision-making by the customer, we found that two characteristic patterns 
of successful unexpected reactions: 

1. By comparing with other items (extreme examples), broadening the cu- 
stomer’s range of thought and adjusting the scale of thought axes. 

2. By presenting a new focus different from the one which the customer 
currently has, jumping her to another thought space or prompting her 
to make a new discovery, i.e. supporting her mental leap. 
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Interestingly, these unexpected reactions are similar to the characteriza- 
tion of design activities in creativity support studies[63.1][63.2]. For example, 
pattern 1 and 2 of unexpected reactions mentioned above can be thought as a 
method similar to innovative design and creative design defined by Gero[63.2], 
respectively. The protocol data we collected shows that presenting a new fo- 
cus as shown in pattern 2, in particular, is a frequent reaction by skillful 
salesclerks who are good at selling goods (i.e., making a chance discovery). 

In addition, our analysis showed that talking about scenes where an item 
is used can be effective explanation of the item’s utility to help smooth men- 
tal leap. Scene information itself serves to enhance customer’s understanding 
of goods and to clear their image. Actually, our protocol data collected inclu- 
des many cases where scene information is frequently used. Providing scene 
information is expected to be useful for taking full advantage of the chance 
for the customer to have a mental leap for successful unexpected reaction. 

The protocol the authors collected verifies that scene information is fre- 
quently used together in the situation where these unexpected reactions 
prompt a mental leap. For unexpected reactions by a salesclerk to cause 
a customer have a mental leap or jump from the current thought axes or 
focus, new thought axes or focus presented must be easily accepted by the 
customer. Unless the customer’s mental leap is smoothly performed, unex- 
pected reactions will end in failure at higher risk. Providing usage scenes 
in terms of scene information may be effective in promoting the customer’s 
understanding, consequently assisting unexpected reactions to result in the 
success. 



63.4 Discussion 

This paper analyzed the conversation between the customers and salesclerks 
in actual purchase activities to show that the reaction by the salesclerk may 
promote the customer’s mental leap. We found that providing appropriate 
information in a timely manner frequently caused the customer’s thought 
axis or focus to change resulting in their making decision. Interestingly, such 
a mental leap phenomenon is similar to the one observed in creative activities 
like design or concept formation. 

Particularly, we demonstrated in detail that unexpected reactions made 
to the customer’s current requirements might promote her mental leap ef- 
fectively. We may say that the skillful seller or creative communicator is the 
seller who can discover chances from the communication context successfully 
and make unexpected reactions effectively. Furthermore, we presented the 
cases to show that providing scene information may be useful for supporting 
unexpected reactions effectively. We do not assert that an unexpected reac- 
tion is always superior to an expected reaction, or leads to more successful 
sales. However, we may say that the skillful seller is a creative communicator. 




63. Chance Discovery by Creative Communicators 467 



Also, such a mental leap phenomenon can also be seen in creative ac- 
tivities such as design and concept formation. Therefore, we expect that a 
lot of the results or knowledge obtained from several studies of creativity 
support[63.3] [63.4] [63.5] can be applied to the decision-making support in 
purchase activities at e-commerce sites. 
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Abstract. In the context of education, a chance occurs when a learner 
makes a mistake. It becomes a good opportunity of learning and brings new 
knowledge. Novel phenomena are often given as counterexamples, that in- 
dicate the difference between a learner’s prediction and the result of her/his 
solution. It is, however, difficult for a learner to learn from counterexam- 
ples, because if the significance of them is not clear, a learner often ignores 
them and learning doesn’t occur. The role of a teacher (or tutoring system) 
is to help a learner grasp the chance. Our research focuses how to evaluate 
the effectiveness of counterexamples. We propose the method of doing it 
from two educational viewpoints: (1) Does it suggest the occurrence of er- 
ror clearly? (Visibility), and (2) Does it suggest the cause of error? (Sug- 
gestiveness) Some case studies are presented to illustrate these functions. 
Then, well compare the chance discovery in other fields with ours, and dis- 
cuss what is the essential for chance discovery. 

Keywords: CAI, education, discovery learning environment, simulation, 
counterexample 



64.1 Introduction 

First of all, we clarify our position. Chance discovery, according to Osawa [Osawa 
2000], aims at finding novel phenomena that indicate significant change in the 
future. It may be an good opportunity or a deadly risk according to human action. 
It is, however, difficult to be aware of a chance, because it often needs some kind 
of discontinuity of thinking. In the context of education, we can find an analogy. In 
learning environment, a learner tries her/his solution and often makes mistakes. 
The occurrence of such errors becomes a good opportunity for learning, that is, a 
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chance. Novel phenomena are often given as counterexamples, that indicate the 
difference between her/his prediction and the result. They have potential to cause 
a learner’s conceptual change. It is, however, difficult for a learner to learn from 
counterexamples, because if the significance of them is not clear, a learner often 
ignores them and learning doesn’t occur. The role of a teacher (or tutoring system) 
is to help a learner grasp the chance. In computer-assisted instruction, such a situa- 
tion is often seen in discovery learning environment. A learner’s erroneous action 
in computer-simulated environment yields an unexpected feedback. The task of 
the system is to visualize it in effective way to promote a learner’s conceptual 
change. How to design such a mechanism is the aim of our research. 



64.2 Chance Discovery in Learning Environment 

In the field of computer assisted instruction, discovery learning paradigm has re- 
cently been getting important. Typically, computer simulation of restricted envi- 
ronment (called ‘microworld’) is constructed, in which a learner can directly ma- 
nipulate the existing objects and see the result of her/his action intuitively. A 
learner explores the world and tries to discover the knowledge and laws in the 
learning domain. Such a situation is educationally good because it promotes a 
learner’s initiative, motivation and interest. 

Discovery learning, however, has two sources of difficulty. The one is that it 
needs several basic skills of ‘discovery task,’ e.g., how to generate a hypothesis, 
how to design an experiment to test it. A learner without such skills often comes to 
impasse or repeats objectless actions. She/he needs some assistance. One way is to 
provide some auxiliary tools which makes cognitive process of discovery explicit. 
For example, in generating hypothesis or in designing experiments, it is quite dif- 
ficult to find out what are the essential elements of the domain. So, to provide a 
list of basic variables will be helpful. Hypothesis Editor and Monitoring Tool 
[Joolingen 1999] are the typical examples. Another way is to provide more ‘intel- 
ligent’ assistance. It gives a learner some advice concerning the contents of the 
discovery task, e.g., to suggest a reasonable hypothesis based on the data in hand, 
to judge the reasonability of the experiment to test the hypothesis. Electric Studio 
[Shoda 1999] is an example, of which domain is the diagnosis of electric cir- 
cuit.For designing the intelligent assistance, it needs the problem solvers in dis- 
covery task. Especially, hypothesis generator and experiment designer are the es- 
sential: The former generates all reasonable hypotheses based on the data in hand, 
and the latter generates all reasonable experiments to test the hypotheses. 

The other difficulty of discovery learning is, however, more serious. That is, 
too much explanation by the system deprives a learner of her/his initiative, which 
is the essential merit of discovery learning. (It may teach her/him what to do step 
by step.) It is preferable that the phenomena in learning environment themselves 
make a learner be aware of what to learn.Such ‘educational’ phenomena often ap- 
pear as counterexamples, which are the phenomena a learner didn’t predict. They 
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impresses on her/him the necessity of learning by suggesting the error in her/his 
action. Thus, the ‘learning from mistakes’ is promoted [Perkinson 2000]. Conter- 
examples, therefore, can become a chance. 



64.3 How to Design Effective Counterexamples 

A counterexample, however, must be carefully used in discovery learning. A 
learner often ignores the anomalous data as the error in measurement, or excludes 
it out of range of the hypothesis [Chinn 1993]. Even when she/he accepts the 
counterexample, without any help, she/he comes into impasse and cannot reach 
the correct hypothesis [Fukuoka 1994, Nakajima 1997]. Therefore, it is necessary 
to evaluate the educational effectiveness of counterexamles, to decide whether 
they are shown to a learner or not. (Inappropriate counterexamples confuse a 
learner.) In general, the followings are essential [Fukuoka 1994, Nakajima 1997] : 

(1) Counterexamples must be recognized to be meaningful and acceptable. 
When the difference is clear and reliable between the real phenomenon and a 
learner’s prediction, she/he easily accepts it and reconsiders her/his idea. 

(2) Counterexamples must be suggestive, to lead a learner to correct under- 
standing. They must include sufficient information for this. 

We have been studying the ‘counterexample-management’ according to the 
viewpoints above. Our domain is mechanics education. Error-Based Simulation 
(EBS) is used, which simulates a learner’s erroneous equation of motion [Hi- 
rashima 1998, Horiguchi 1998, 1999, 2000]. As a counterexample, EBS is evalu- 
ated as follows: 

(!') Does objects’ erroneous motion in EBS make a learner be aware of the oc- 
currence of error? (Visibility) 

(2') Does objects’ erroneous motion in EBS suggest the cause of error? (Sug- 
gestiveness) 

We have designed such mechanism and developed the EBS management sys- 
tem, of which usefulness has confirmed through some experiments. In following 
two chapters, we illustrate the framework for the counterexample management 
using EBS. Then, we'll compare the chance discovery in other fields with ours, 
and discuss what is the essential for chance discovery. 
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64.4 Designing ‘Visible’ Counterexamples 

Error-Based Simulation [Hirashima 1998] 

EBS is generated by mapping an erroneous equation in formula to simulation 
(Figure 1). It shows unnatural motion in contrast with correct one (A learner is as- 
sumed to predict correct motion). The differences arouse cognitive conflict, to 
promote reflection by a learner.Figure 2a shows an example. When a learner sets 
up Equation-B for the Block, EBS shows it ascending the Slope. The difference 
between EBS and correct motion is clear, so it visualizes a learner’s error well. 



Clarity and Reliability [Horiguchi 1998, 1999] 

In other cases, the difference isn’t always clear. In Figure 2b, for example, EBS 
generated from Equation-C only shows the Block moving in the correct direction 
along the Slope (its velocity is a little different). A learner feels difficulty in judg- 
ing whether the motion is correct or incorrect. The same matter occurs for Equa- 
tion-D. Changing some parameters in simulation often makes the difference clear. 
In the case of Equation-C, when the angle of Slope 0 increases, the velocity of 
Block decreases, while it increases in correct motion. For Equation-D, when 0 be- 
comes zero, the Block still moves on the flat floor! Such unnatural changes in be- 
havior also stimulates a learner, but it must be noted that ‘too large’ parameter- 
change spoils the reliability of simulation itself. The criteria that evaluate the clar- 
ity of EBS’s difference from correct motion (called ‘Criteria for Error- 
Visualization; CEV’) say the more qualitative difference of velocity and/or accel- 
eration the EBS has, the clearer it is. The criteria that evaluate the reliability of 
EBS say the larger parameter-change is applied to EBS, the less reliable it be- 
comes.We previously proposed a EBS-management mechanism using these two 
kinds of criteria, and confirmed its usefulness by pilot test. 

The both criteria define the ‘visibility’ of EBS as a graphical presentation. The 
former corresponds to ‘effectiveness’ criteria and the latter ‘expressiveness’ crite- 
ria in information visualization research field [Mackinlay 1986]. The merit of this 
approach , in managing EBS, is that it doesn’t need the problem- solver of learning 
domain, and that it can transform the issue of a learner’s ability of problem- 
solving to the issue of her/his ability of motion perception. 



Designing ‘Suggestive’ Counterexamples [Horiguchi 2000] 



Error-Identification 

The merit of ‘visibility’ viewpoint is its simplicity. It does not depend on the 
problem-solving process but only on the resulting phenomena, so it is compara- 
tively easy to design the evaluator of counterexamples’ effectiveness. Such coun- 
terexamples, however, don’t always provide a learner useful information to correct 
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her/his error, and sometimes mislead her/him. This comes from the lack of consid- 
eration of the problem-solving process. Therefore, paying attention to the ‘sug- 
gestiveness’ viewpoint is also important. 

Apparently, the problem solver which can construct the correct equation is nec- 
essary. We developed it by modelling the formulation process of equation in me- 
chanics. The model focuses mainly on the process in which a learner enumerates 
the forces acting on the objects, so it consists of a set of production rules, called 
Force-Enumerating Rules (FERs). They describe the conditions for the forces to 
act. Part of them are shown in Table 1. 

The correct solution inferred by the problem solver is compared with the one by 
a learner (inputted through the interface which allows her/him to construct equa- 
tions and diagrams), and the differences are checked. Some rules are necessary 
which link the error-appearance on her/his solution to its cause. By considering a 
learner’s error as the error about FERs, the rules are formulated as shown in Table 
2. They are called Error-Identification Rules (EIRs), which link the erroneous part 
of a learner’s solution to its cause and instruction strategy. 

Suggestiveness 

The identified cause of error must be visualized and suggested by EBS. The last 
set of rules we need is the one which describes what kind of motion in EBS sug- 
gests what kind of misconception in problem solving. The fundamental idea is 
quite simple. When a human observes an object moving, she/he feels its ‘motive 
force’ working. We apply this fact to the difference of motion. Eor example, as- 
sume that a learner observes a block moving to the left with deceleration when 
she/he predicted it moves to the left with acceleration. She/he will feel that the 
force which acts to the left is missing, or that the force which acts to the right is 
extra. The same thinking is possible about the relative motion of two objects. The 
rules are formulated as shown in Table 3 and 4. They are called Criteria for Cause- 
of-Error Visualization (CCEVs), which link the motion in EBS to the cause of er- 
ror suggested. 

For example, consider the problem in Figure 3. When a learner constructs the 
equation in Figure 3a (the direction of friction pN is erroneous), the EBS (with no 
parameter-change) becomes as shown in Eigure 3b, in which the string shrinks. 
According to the CCEV in Table 4, however, this motion suggests the error about 
tension, which is not the cause of error in this case. Therefore, another EBS (with 
parameter-change) is searched as shown in Figure 3c. When the mass m2 in- 
creases, the block's velocity increases according to the erroneous equation, while it 
decreases according to the correct equation. This satisfies the CCEV in Table 3, 
and the EBS correctly suggests the error about friction pN (Eigure 3c). 
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64.5 Discussion 

The major feature of education in chance discovery is, most people may consider, 
that in this field there is the Omniscient: a teacher. This view is, however, not ex- 
actly accurate. WeTl explanation this reason.Apparently, in other fields, one of the 
difficulty in chance discovery comes from unknownability: getting aware of the 
importance of the data that was regarded as nugacious or out of range, often brings 
discovery. It needs the knowledge that cannot be pre- written, so isn’t intrinsically 
algorithmic. This is why the collaboration between human and computer is neces- 
sary. On the other hand, in education, a teacher knows ‘everything’ (or expected 
so). She/he knows what kind of error promotes a learner to learn, that is, what is 
the chance for a learner and when it occurs are knowable. However, how can 
she/he make a learner be aware of it? 

Of course, to say ‘This is a chance, learn!’ has no effect. It is necessary to 
arouse ‘reasonable doubt’ about a learner’s (wrong) knowledge, to orient her/him 
to correct understanding. A teacher must provide the appropriate information for 
this. In fact, to be aware of a chance and to make a learner be aware of it, are quite 
different matters.The advantage of a teacher is the rich knowledge of the learning 
domain. She/he can analyze it and design the appropriate instructions. Visualiza- 
tion, as shown some examples in this paper, will be one of promising method. 

We conclude this paper by pointing out the other important viewpoint. In this 
paper, we considered only elementary problems in the learning domain, and as- 
sumed the learning process of avarage level. When the problems become more 
difficult, it also becomes difficult to reason a learner’s thinking process. Solution 
may need some kind of discontinuity of thinking, like ‘insight.’In such a situation, 
even for a teacher, it is not apparent what kind of error promotes what kind of 
learner. She/he needs to discover a chance. 
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Stories of the recent failures in complex systems tell us that they could have 
been avoided if the right information was presented to the right person at 
the right time. We propose a method for fault detection of spacecrafts by 
mining association rules from house keeping data. We also argue that merely 
detecting anomalies is not enough for failure prevention. We present a frame- 
work of design information management in order to capture and use design 
rationale for failure prevention. We believe that the framework provides the 
basis for improved development process and effective anomaly handling. 



65.1 Introduction 

Recently, we have experienced a series of failures in complex systems in J apan 
as well as other parts of the world. To list a few, there were the criticality 
accident at nuclear fuel plant in Tokai Village in September, 1999, the failure 
of the launch of Japanese Space Agency’s flagship H-II rocket in November, 
1999, and the fatal subway crash in Tokyo in March, 2000. 

Why do these accidents happen? One of the reason is that the scale and 
the complexity of such systems are intractable to a single person. As a result, 
oversights happen more often than before in design or during operation, lea- 
ding to an accident. Von Braun is said to have understood the whole system 
of Saturn V rocket. But it is not possible for a single person to understand 
the whole system with the scale and complexity of current systems, like space 
shuttles. Therefore, we believe that computer support is needed to manage 
the scale and complexity in the development process. 

As the scale and complexity of a system grows, the number of people nee- 
ded for the development of the system increases. As a result, communication 
between people becomes one of the crucial aspects of the development process. 
Indeed, many of the recent failures can be attributed to poor communication 
between developers or between development and operation. Computers are 
already widely used for communication like e-mails, but we believe computer 
could do better than merely sending e-mails back and forth. 

Failures, especially the catastrophic ones, do not happen suddenly. Usually, 
there are some signs indicating the imminent failure. This fact is depicted in 
the Heinrich’s law: behind one catastrophic accident, there are 30 less severe 
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accidents, and 300 near misses. By paying attention to these events, it might 
be possible to find a way to avoid imminent catastrophic accidents[65.1]. 
But, if we were not aware of those events, we would not be able to take any 
measures. 

In order to prevent failures, we take two approaches: 1) fault detection 
by data mining, and 2) managing information for failure prevention. Data 
mining techniques can be used to detect anomalies that otherwise will be 
overlooked. However, anomalies themselves do not manifest any semantics, 
as they are discovered solely based on statistical properties of the data. It is 
humans that percept the semantics of anomalies in order to make the best 
use of them. We propose a framework for managing design information, not 
only for the improved design process, but also for providing support in the 
operation of the system, especially in handling anomalies of the system to 
prevent failures. 



65.2 Fault Detection of Spacecraft by Mining 
Association Rules of Housekeeping Data 

Fault detection is one of the key issues in the development of advanced spa- 
cecrafts. Although several detection techniques including limit-sensing, simu- 
lation and expert systems have been employed for this purpose, they have 
often overlooked small anomalies in the housekeeping data and some of them 
have led to fatal damages to the overall missions. 

One reason for the difficulty is that conventional fault detection methods 
generally require a tremendous a priori knowledge on the system behavior for 
each spacecraft, whereas that kind of knowledge is not always easily available. 
For example, a perfect dynamics model for simulation or a complete set 
of production rules for expert system is usually too expensive to prepare 
for each spacecraft. Another reason is that these methods can grasp only 
limited aspects of overall spacecraft system behavior. For example, limit- 
sensing examines only upper and lower bounds of individual sensor values, 
and dynamics simulation can be performed merely on several subsystems 
such as attitude control system. 

We proposed a fault detection method for spacecrafts based on data- 
mining techniques [65.2]. In this method, at first, a set of association rules, 
which describe ’’qualitative” relationships among time-series sensor signals, 
are mined from the accumulated spacecraft housekeeping data. Then, they 
are applied to monitored spacecraft telemetry to detect anomalies. 

Proposed method consists of the following three steps: 1) pattern cluste- 
ring of time-series data, 2) extraction of association rules among patterns 
sampled from different time-series, 3) real-time monitoring with acquired as- 
sociation rules. First two steps are performed on data which is telemetered to 
the ground station and accumulated during the initial phase of the operation 
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after the launch of the spacecraft. In the last step, association rules obtained 
in the previous step is applied to the real-time telemetry from the spacecraft 
(Fig. 65.1). 

As our method attempts to check the system behavior from a different 
point of view using a different source of knowledge, we expect that it will be 
able to detect some sorts of anomalies which have been usually overlooked 
by conventional methods. However, it is unlikely that our method will detect 
all kinds of system faults. In other words, it is no wonder that conventional 
methods such as limit-sensing or simulation approach are more suitable for 
detecting some classes of faults. Therefore, we are going to make more inve- 
stigation on to what kind of faults our method is more effective than other 
approaches. 



Data from Initial 
Phase of Operation 




1 . Analyze clusters of 
time-series patterns. 



Real Time Data 



2. Mine association rule 
between time-series. 





3. Detect fault on-line 
and present results. 



Fig. 65.1. The process of detecting anomalies of a spacecraft by mining association 
rules from housekeeping data. 



65.3 Managing Information for Failure Prevention 

65.3.1 Using Design Information for Failure Prevention 

In this paper, we present an approach toward managing design information for 
failure prevention. We beleive that certain kinds of failures can be prevented 
by bringing the right information at the right time to the right person. 
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Fig. 65.2. The process of handling anomalies of a spacecraft with Design Informa- 
tion Repository. 



The idea behind our approach is to store as much information as possible 
on the system by minimizing the cost to input data, while organizing the 
information in such a way that it can be utilized by computational power 
and reused later for failure prevention. In order to implement the idea, we 
propose a framework called Design Information Repository (DIR) to manage 
design information. The purpose of the framework is to allow every person 
involved in the system to access the design information and makes best use 
of all the information available to prevent failures. 

In the process of development, manufacturing, and operation of a system, 
large amount of various information on the system is produced: design do- 
cuments, drawings, communications between the parties involved, feedbacks 
from manufacturing and operation divisions, etc. Among various design infor- 
mation, we especially put emphasis on capturing and using design rationale. 
Design rationale is defined as the knowledge about the artifact explaining 
how and why it is designed the way it is. Design rationale is considered to be 
useful for supporting design problem-solving [65.3]. 

There has been much effort on capturing design rationale for a decade. 
Shipman et al. have depicted three perspectives of design rationale capture: 
argumentation, communication, and documentation [65.4]. In argumentation 
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perspective, what to be captured as design rationale is the designer’s reaso- 
ning that occurred during the design activity. Since the argument structure 
is readily available, argumentation perspective benefit from various reasoning 
techniques. However, since human thinking itself is not structured, it takes 
a lot of effort to structure the argument to input into the system. This pro- 
blem is known as capture bottleneck. In communication perspective, capture 
bottleneck is not present as communication between designers are recorded 
as it is. Though, because of the lack of structure of information, reasoning 
services in argumentation perspective is not available. 

65.3.2 Design Information Repository 

Not only design documents and drawings, but also communications between 
designers, arguments on design decisions, and feedbacks from manufacturers 
or operators can be design information. As a way to record as much infor- 
mation as possible and to capture design rationale, while allowing further 
formalization performed on the information as user demands, we propose 
a framework for incremental formalization and organization of information. 
The idea is to have all design information stored electronically, and allow 
access to every person involved in the target system. We call the framework 
Design Information Repository. 

In this framework, all information is stored in the repository. Information 
stored in the repository can be design documents, drawings, e-mails, simula- 
tion results, anomaly reports, etc. Each information has typed attributes and 
query can be performed on them. Ontology can be incorporated in the system 
to characterize information in a suitable way for each application domain. 

Around the repository, DIR provides support for project management, 
support for communication between developers, and reasoning service. With 
project management support, users, especially the project manager, will be 
able to manage the issues at hand. Users will be able to record issues or 
concerns, arguments related to those issues, and how they are resolved. A 
portion of a document stored in the repository can be registered as an issue, 
as well as an argument, or a resolution to an issue. Thus, information in the 
repository will be organically structured around the network of issues. Later, 
this structure can be used to catch unresolved issues or conflicting design 
decisions by reasoning engine and application specific knowledge base. 

With communication support, users can register to DIR requirements or 
requests to other subsystems explicitly by embedding predefined tags in e- 
mails. DIR will check deadlines of requests and send out notices when deadline 
comes. All the e-mails are stored in the repository, and becomes source of 
information. For example, question-answer pairs can be useful information. 
Our vision is that, by introducing DIR, the development process will be 
improved by efficient handling of issues and effective communication, and it 
will be useful source of information in handling anomalies. 
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65.3.3 Handling Anomalies 

Detecting anomalies itself does not prevent failures. The technique we presen- 
ted in the previous section is based on statistical properties of the telemetry 
data, and does not give indication as to how critical the anomalies it detects 
are. It is human who has to decide the significance of detected anomalies. 

One of the scenarios that we would like DIR to play an important role is 
anomaly handling. In handling anomalies, it is crucial to present “the right 
information at the right time.” Especially, in case of an emergency, prompt 
access to the needed information is crucial for dealing with the situation. The 
process of handling anomalies is shown in Fig. 65.2. With DIR, the operator 
handling an anomaly can get quick access to the relevant information. By 
incorporating ontology, the operator can search information by component, 
by behavior of the system, or by the combination of the two. 



65.4 Current Work and Conclusions 

This paper depicted that in order to prevent certain kinds of failures in 
complex systems, it is important to manage design information in such a 
way that the right person has the right information at the right time, as 
well as detecting anomalies in the system. We believe that both detecting 
potential failures in the system and helping human resolve presented risks 
by providing appropriate information are important to prevent failures of 
complex systems. Thus, our approach is to combine data-mining techniques 
and design information management. However, a lot of issues remains to be 
investigated on both methods before combining them. 

We are now implementing the DIR framework. We plan to apply it to 
artificial satellite development project, and evaluate the concept in a real-life 
situation. 
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Abstract. A rare opinion may be more meaningful than ones supported by 
the majority of people. Such an opinion breaks into a popular concept if 
people become aware of the opinion and admire it as highly acceptable, 
and the prevalent support grows to be an established consensus. This pa- 
per is dedicated to aid in finding the seed of this process, i.e., an opinion 
with the latent popularity. The structure of the co-occurrence (occurrence 
in the response by the same subject) between opinions is shown to the key to 
identifying such an opinion. KeyGraph, a text indexing method, is applied 
to an accumulated questionnaire-result data for visualizing such a struc- 
ture. The experiment show the mixture, of the human imagination and the 
output of KeyGraph, clarify the significance of opinions for forthcoming 
consensus. 



66.1 Introduction : Which Opinions Grow into Consensus ? 

Rare information and opinions sometimes grow into a prevalent concept, if they 
satisfy the desire of people for information [1]. Our aim is to detect opinions the 
prevalence of which can satisfy a wide range of people. People first become aware 
of such an opinion and accept it, and the idea then may grow to be established. 
Along this process, the established popularity of goods such as cellular phones 
have grown and are prevalent today. In this paper, the problems addressed are: 

1) What kind of opinions grow into a consensus ? 

2 ) How can we support human awareness on such opinions ? 

We will point out why previous analysis methods of social survey data cannot 
find such a growable opinion, and KeyGraph [2] is presented as an alternative 
method fitting our goal. From questionnaires about Kobe citizens’ awareness on 
and the activities against risks, we show KeyGraph extracts rare but meaningful 
opinions. The significance growth of such obtained opinions are evaluated by in- 
terviews to a group of subjects. 
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66.2 KeyGraph for Noticing Consensus Seeds from 
Questionnaire 

Opinions of people have been analyzed by several kinds of social surveys. Here, 
let us consider questionnaires where each question is answered in the manner of 
selecting one from a prepared set of opinions. 

A number of methods for analyzing questionnaire results appear in the com- 
mercial software packages for data analysis. This recent phenomenon implies that 
existing methods, e.g. path analysis [6], co-variance analysis, multivariate analy- 
sis, clustering and several hypothesis-testing approaches are somehow authorized 
for social/market surveys. Several devices have also been proposed in the area of 
data mining for discovering comprehensible knowledge about public opinions 
from questionnaire data. These methods focused attention to learning frequent 
patterns in data, rather than detecting rare but significant opinions. 

Although some data mining methods had success in explaining the conditions 
for the occurrence of rare events. [7], our aim is even different: A new opinion 
relevant to common values of two or more relevant communities, i.e. an opinion 
which appeared rare in the past and relevant to previous contexts of interest Al, 
A2,... or Am, each being from different communities. That is, different commu- 
nities having little chance to meet, can meet to make a trigger to the innovation of 
a new idea as mentioned in several philosophical studies [3,4] -they can discover 
unnoticed new ideas commonly meaningful to those different communities. As 
well, the awareness on the relation between one’s initial value (constructed in a 
community) and new information (coming from another community) sometimes 
trigger the discovery of a new knowledge, if people involved has a proper context 
to share in talking or doing something together [5]. 

In order to identify context Al, it is helpful to find a set of opinions hi, b2, 
....and bn made under the context of Al. Under Al, these opinions tend to co- 
occur, i.e., conceived by people in the same community. However, it is rare that 
all of (hi, b2, ...bn} appear in the opinion-set of the same people. It is rather 
usual that the set is decomposed e.g. into (hi, b2, ...bx, ...by} and jbx, ...by, 
...bn} appearing in the answers of people with different ideas sharing the context. 
A set of opinions of this kind can be extracted by connecting multiple and differ- 
ent set of answers to a questionnaire, each set co-occurring in the answer of the 
same subjects. Give such a questionnaire data, a method appropriate for extracting 
a set of opinions in a community taken here is to follow the two steps: 

Step 1) Find co-occurring pairs, e.g., (bl-bx, bx-by, by-bn} of answers in the data 
Step 2) Connect co-occurring pairs to form the cluster of opinions under the 

common context, as {bl-bx-by-bn} 

The result will be a cluster as in the triangle with 84#, 202#, and 1# in Fig.l 
where the term denoted as m# is an opinion in data as in Eq.(l). Some opinions 
may stand alone as 76#. Each of these separated opinion-set corresponds to a 
community of people sharing a context. 




66. Action Proposal as Discovery of Context 



483 




Fig. 1. Clusters of opinions in communities/contexts, and a growable new opinion 249#. 

Then an opinion as 249# in Fig.l, commonly relevant to the interests of these 
communities, is expected to be the new topic/idea growing to be a broad consen- 
sus of society in its significance, if the opinion appears newly. KeyGraph [3] first 
follows steps 1) and 2) above to obtain clusters of co-occurring frequent answers 
in a questionnaire survey, corresponding to the contexts of existing communities. 
Then, it obtains answers not so frequent as ones in clusters but co-occurring with 
multiple clusters - the obtained answers are regarded as growable opinions to a 
broad social consensus related to the contexts of multiple communities. By visu- 
alizing these relations, KeyGraph induces user to be aware of significant rare 
opinions. 



66.3 Family Perception of Risks and Opportunities 

Here, we exemplify our method with the data of answers to a questionnaire about 
one’s awareness on various risks and opportunities to survive them. The question- 
naire survey was conducted by an author (Y. Nara) and CO-OP to citizens of 
Kobe, after the disaster of South-Hyogo earthquake (M7.0, Kobe in 1995, 6600 
people victimized). The survey was of the period from Sep 10 through 30, 1998. 
Subjects were sampled with the random sampling method, from the Kobe Con- 
sumers Associated Society (0.6 million households). 770 valid forms were col- 
lected (50.8% valid). The questionnaire, as a whole, aimed at surveying 

(1) Citizens’ interest in disaster protection activities, and their process to reach 
protection activities from sheer awareness on disaster risks. 

(2) The demands of citizens for the promotion of CO-OP products. 

The data dealt in this paper were taken in 1998. Note here that the subjects are 
in a situation three years after the disaster they were directly involved in. This 
situation can be interpreted that they learned what occurs with a great earthquake 
but is forgetting some part of the feeling they had. 
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66.3.1. The Results of KeyGraph 

An example output of KeyGraph is in Fig. 2, the output of KeyGraph for the data 
including answers to all questions from all subjects. Hereafter, and black (dense- 
colored) nodes and black (solid) links form pre-existing communities i.e., clusters, 
and red (thin) links and red (arrowed) nodes show the links among clusters and 
new and meaningful opinions. Some red links are between black nodes instead of 
between red and black nodes. This is because a node’s co-occurrence with its be- 
longing cluster is counted but not shown by red links, for the co-occurrence is al- 
ready depicted by black links in the belonging cluster. 

The large cluster in the center is made of seemingly old peoples’ family culture 
or the feeling of people caring much about their families, in Japan, e.g., 31-a-4 
“we visit neighbors on moving to a new residence” “I am caring about the health 
of my family members” etc. On the other hand, the single-node cluster 14-d-3 in 
the left hand side of the figure is an answer saying “my family often talk about 
what do to if the next quake-disaster comes.” The red node 29-5-1 standing on 
these two clusters means “I like the home-delivery system of CO-OP.” In the dis- 
cussion between ones knowing the background situations of the subjects, we in- 
terpreted this figure that people like CO-OP home delivery if they have babies or 
old people in bed, because they can hardly go shopping (note: Japan has no social 
system of baby-sitters). 




Fig. 2. The KeyGraph output for all the questions. 
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66.3.2 Which Opinions Grew into Consensus ? 

A group interview took place with 7 university students being together in a discus- 
sion room. They looked at each figure as shown in the examples above. We had 
12 red nodes in the presented five figures, and two of the figures had no red nodes. 
The results can be summarized: For seven of the red nodes, the students did not 
make confirmed comments in the beginning. Then, in the discussion they agreed 
to an interpretation for each red node. For each cluster of black nodes, the stu- 
dents agreed to one interpretation what kind of people (i.e. interest-context) the 
community corresponds to. 

After the interview, the students and the second author (Y.Ohsawa) talked about 
the results without looking at the figures. At this point, all the comments were 
concentrated on the red nodes, not on black nodes. The reader might think the 
color (red) made the nodes outstanding, hut the students first made comments 
about the black nodes. As a result, the red nodes corresponding to “new growable 
opinions” grew from minor into major. Furthermore, creative ideas for the man- 
agement of CO-OP came out, e.g., customers buying adult- or baby-incontinents 
are good targets for home-delivery service, from Fig. 2. 



66.4 Conclusions 

The model of the growth process of a minor opinion to be a consensus to the ma- 
jority of people is given, and the algorithm of KeyGraph is shown to correspond to 
the model. KeyGraph was applied to questionnaire data about peoples’ family- 
wise awareness on various risks. The visual output was validated to aid in the 
growth of meritorious knowledge. 
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67.1 Introduction 

While various techniques for chance (or risk) discovery have been proposed so 
far, they mainly analyze symbolized time-series data such as text or monthly 
sales amounts. On the other hand, we can easily access to unlabeled digiti- 
zed data such as audio or video signal owing to the recent development of 
networks, computers and video devices. In this paper, we focus on pattern 
retrieval methods, which enable us to discover chances directly from such raw 
data. 

The basic idea is that we get certain kind of features from each frame 
and find similar intervals in the feature sequence. Those similar intervals 
can be handled as symbols and various relationships among symbols can be 
extracted. In this paper, we call a database an input and a query a reference, 
because of the history of recognition method. [67.1] For some method, a 
feature sequence is transformed into a vector quantized (VQ) sequence. 

Let us classify time-series retrieval methods into three categories as fol- 
lows: (1) Retrieval of exactly same intervals, (2) retrieval of temporally war- 
ped similar intervals, (3) retrieval of intervals whose order of sub-intervals is 
similar. [67.7] The first methods are proposed for text retrieval. [67.1] 

We propose the use of the second and third methods for chance discovery 
because they are applicable to audio or video signals, which usually have 
noises. The second methods include Hidden Marcov Model (HMM) or Con- 
tinuous Dynamic Programming (CDP) [67.3] [67.4] which constrain the order 
of frames allowing temporal warp. Those methods can handle time-series 
data with small kind of VQs with temporal warp such as voice retrieval (in 
this case, VQs are phonemes). The third methods divide the query into sub- 
intervals and neglect the order of frames in each sub-interval while the order 
of sub-intervals should be similar to the query. The main application field 
of those methods is motion images or audio signals, which have huge kind 
of VQs compared to the frame number contained in the sub-interval. The 
similarity of such query and database is small in most of the database. Time- 
Series Active Search (TAS) [67.2] is the only efficient method in this method. 
TAS skips more and more frames as the similarity decreases and achieves 
quick search without degrading the retrieval rate. 



T. Terano et al. (Eds.): JSAI 2001 Workshops, LNAI 2253, pp. 486-490, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




67. Retrieval of Similar Time-Series Patterns for Chance Discovery 



487 



TAS compares the histograms of each VQs in each sub-intervals and calcu- 
late the similarity. The method has been applied to the detection of comercials 
in TV programs. 

On the other hand, Reference Interval-free Continuous DP(RIFCDP) 
[67.5] has been proposed in order to retrieve similar intervals among two 
time-series data. This method achieved retrieving similar voice directly from 
example database. [67.6] RIFCDP belongs to the second method, which finds 
temporally warped similar intervals. 

However, no method has been proposed for the third retrieval method. 
If such a method is realized. We can retrieve repeated programs stored in a 
huge (ex. Three months) TV broadcasted database and search a crucial scene 
replayed in a sports program. Furthermore such a retrieval method is also 
effective for compression, summarization and analysis of the database. 

Therefore, we propose Reference Interval-free Time-Series Active Search 
(RIFAS), which enables quick retrieval of similar intervals hidden in audio or 
video signals. [67.7] The basic idea is similar to the conventional TAS, but 
the skip direction in not only the input axis but also the reference axis. By 
the way, the TAS also achieves the function similar to RIFAS by repeating 
TAS shifting the interval on the reference, we hereafter call this method TAS 
repetition. 

In this paper, RIFAS is proposed in section 2. Some approximation me- 
thods of RIFAS are proposed in section 3. Section 4 evaluates RFIAS using 
artificial data and motion images, concluding in section 5. 



67.2 Reference Interval-Free Active Search 

RIFAS is more efficient than TAS repetition because it also skips in reference 
axis. Here we define the similarity S{t, t) between an interval r to r -|- Nd — 1 
frames in the reference and an interval t to t + Nd — 1 frames in the input. 
Then the skip width w{t,T) is also calculated same as the active search. 

We explain one example of RIFAS which is realized in this paper using 
Figure 67.1. Firstly, TAS is applied to the first interval of Nd frames in the 
reference. Hereafter we call this Nd frame as search width. Secondly, a triangle 
area with the height w(t,T) is made at each matching point. Those arias are 
called skip area. Thirdly, as for the next interval of 2 to Nd + 1 frames in 
the reference, TAS is applied only to the points outside of all the skip areas. 
Example shown in figure 67.1 has only two matching points. Furthermore 
matching continues from low valleys of triangles to high ones as shown in the 
bottom in figure 67.1. 
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Fig. 67.1. Reference Interval-free Active Search 



67.3 Experiments 

In this section, we evaluate RIFAS and TAS repetition using image sequence 
captured from TV programs. A personal computer with OS: windows2000, 
CPU: K6 400MHz is used for this experiments. We used seven hours black and 
white image sequence (752399 frames), which were captured every 30 frames 
per second from TV programs. The data was converted into VQ (number 
from 0 to L = — 1) sequence. Firstly, average value of the whole image 

is quantized into 16(2"^) level. Secondly, the image is divided into three by 
three and the average value of each are calculated. Lastly each average value 
divided max average value is quantized into 4(2^ ®) level. The reference and 
input is put same and searched similar intervals inside the image sequence 
data. Therefore the search area is restricted to the right bottom triangle area 
on (t, r) plane. 

We examined this data and made a list of similar intervals. Search 
width was fixed to Nd = 3H450 (corresponding 15 sec.). Then the ground 
truth list was made by selecting the true intervals more than 450 x 6 fra- 
mes. Threshold is changed as 0 = 3150.9, 0.8, • • • , 0.3 and entropy threshold 
6*e = 3Z50, 0.05, 0, 1, • • • , 0.95 for RIFASskip(a = 3751) The comaprison bet- 
ween RIFAS and TAS repetition is done as the maximum detection rate 
parameters. 

The detection rate is calculated by averaging precision rate Nc/Nd and 
recall rate Nq/Nt- Here, Nc, No, Nt are the number of corrected intervals, 
detected intervals and ground truth intervals. The condition of correctness 
is the Interval detection rate is more than 0.3. This means the shift frame 
number is less than 50% of the interval length. 

The results of RIFASskip is shown in figure 67.2. Vertical axis is threshold 
theta. At each theta, entropy threshold Og was changed and the maximum 
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Fig. 67.2. Detection rate of RIFASskip(7 hours motion image data). 



Table 67.1. Experimental results of three methods by motion image data. 



II 1 


1 TASrep. 


RIFAS 


1 RIFASskip II 


II Detection rates(%) | 


1 83.1 


82.8 


1 84.4 II 


II Search time(min.) | 


1 384 


53 


1 12 II 



rate was ploted. Detection rate was highest (84.4%) at 0 = 3D0.5. The VQ 
difference in dynamic scenes is one of the reasons of the decrease of this 9 
compared to the artificial experiments. The VQ is also different from human 
sense causing detection rate decrease. 

The results of the three methods, TAS repetition RIFAS and RIFASskip 
at 6* = 3D0.5 are shown in table 67.1. The results shows RIFASskip reduced 
computational time about 1/30 of TAS repetition and about 1/4 of RIFAS 
without any decrease of detection rate. 



67.4 Summary 

The application area of chance discovery must be enlarged by the proposal 
of using RIFCDP or RIFAS for video or audio signals. The future work is to 
introduce more efficient method than piling the area with triangle, instead 
rectangle. The feature extraction method is also challenging. More and more 
sophisticated methods are expected. The two dimensional RIFAS for image 
retrieval is also necessary. 
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This chapter describes fuzzy knowledge based systems and intelligent control 
on the light of chance discovery. It also gives some hints on the application 
of chance discovery from this perspective. 

Keywords: Chance discovery, fuzzy knowledge based systems, 
control systems, level-two control systems 



68.1 Introduction 

One of the most successful applications of fuzzy logic are rule based systems. 
These fuzzy Knowledge Based Systems are nowadays commonly applied to 
control and modeling applications. In the typical application, Fuzzy Know- 
ledge Based Systems are defined by means of a set of flat rules (with fuzzy 
predicates in the antecedents and the consequents) that are all applied at 
the same time. This kind of systems presents some difficulties when used to 
control or model complex systems. Unpredicted changes in the environment 
and the usually large set of variables are the most noticeable difficulties that 
these systems have to face. Intelligent control and hierarchical systems are 
two of the approaches used for controlling and modeling complex systems. 

Recently, Chance Discovery was introduced as a new research field that 
focus on some of the issues that are difficult to deal in Data Mining. In 
particular, chance discovery aims to provide means to detect and take profit 
of new situations [68.6]. 

Chance discovery is described in [68.7] as anticipation, and quoting [68.1] 
the latter is: “inventing or creating new alternatives where none existed be- 
fore”. Chance discovery, from the perspective of a Fuzzy Knowledge Based 
System designer, can correspond to the determination of the right moment 
for making a radical change in the system so that its performance increases. 
According to this, we can foresee its application on selecting a new rule base 
when the initial conditions of a system have changed. 

In this chapter we elaborate on the application of Chance Discovery to 
Fuzzy Knowledge Based Systems. In Section 68.2, we describe the difficulties 
of developing Fuzzy Knowledge Based Systems for complex domains and 
some of the approaches considered in the literature to solve them. Then, in 
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Section 68.3, we give a general architecture for level two intelligent control 
and point out some relations with Chance Discovery. The chapter finishes in 
Section 68.4 with some conclusions. 





Fig. 68.1. Architecture of the system 



68.2 Fuzzy Knowledge Based Systems 

Fuzzy Knowledge Based Systems are one of the most successful applications 
of fuzzy logic and fuzzy sets technology. These systems are usually defined in 
terms of fuzzy rules (rules in which fuzzy terms are used in their antecedent 
and consequent part). Typical applications are control and modeling. See, for 
example, [68.3], [68.5] for details. 

When the application domain moves from simple systems to complex ones, 
the usual operation procedure becomes infeasible. Note that typical simple 
applications are defined using a fiat set of rules. In this case, all rules are 
applied at once and the final output is computed by means of a defuzzification 
of the combination of the conclusions of a set of rules. 

In complex systems, two main difficulties arise. They relate to the number 
of variables of the system and on the application domain: 

1 . The number of variables is usually large and this causes that the number 
of required rules increases exponentially. This is so because typically the 
number of rules is m" where n is the number of variables and m is the 
average number of terms for each variable. This problem is the so-called 
curse of dimensionality. 

2. The typical environment is usually a changing one. Moreover, these chan- 
ges can not be modeled in an easy way using variables. In this case, the 
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performance of a system can decline as soon as the properties of the 
environment move away from the foreseen ones. 

Techniques have been developed to deal with these difficulties. 

1. To deal with the curse of dimensionality, hierarchical fuzzy systems have 
been developed (see [68.10], [68.9], [68.8] for details). These systems re- 
place a large rule base by a set of smaller and modular rule bases. These 
smaller rule bases are connected by means of additional variables and 
inference is chained among modules of rules. 

2. To deal with the changing environment, adaptive intelligent control have 
been developed (see e.g. [68.4]). These systems are able to adapt the rules 
when the environment changes. 

Karr [68.4] considers four distinct levels of intelligent control based on the 
adaptability of the system to the environment. These levels are the following 
ones: 

Level-zero intelligent control system: Corresponds to a system that can im- 
prove its tracking error. This is, given a desired value for a system varia- 
ble, the system is capable of reducing the difference between the actual 
value and the desired one. 

Level-one intelligent control system: Corresponds to a system that besides of 
controlling the tracking error is capable of making self-improvements to 
the coefficients used in the control system. Some of the mechanisms for 
achieving this type of control are neural networks and fuzzy systems. 
Level-two intelligent control system: An internally-generated performance 
measure is used, and optimized, at the same time the tracking error 
is driven to zero. 

Level-three intelligent control system: Corresponds to a system that includes 
a planning function. They have the ability to plan ahead for certain 
situations, and can also autonomously simulate and model uncertainties 
that might appear in the system being controlled. 

As an example, we can consider the well-known inverted pendulum. A 
change on the conditions of the pendulum - e.g. change of its mass - can 
invalidate the rules (this example is considered in [68.4]). In this case, a level- 
two intelligent control has to detect as soon as possible from the available 
variables (via some indicators) that the system has changed and update all 
the rules accordingly. In Section 68.3 we give an architecture to model this 
type of processes. 



68.3 System Architecture 

In this section we describe a system architecture for level-two intelligent con- 
trol on the light of Chance Discovery. To do so, we consider that a certain 
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environment can be modeled using a set of variables and inspected through 
the values of this set together with the values of a set of indicators. It is 
assumed that the variables and the indicators define the state of the envi- 
ronment. Performance is considered a special indicator that gives a general 
overview of the system and that can be computed as a function of the other 
indicators (and the variables). Under these assumptions, the goal of the con- 
trol system is to have a good (optimum) performance. On the light of chance 
discovery, we consider a chance as a period in which a radical modification 
of the Knowledge Based System^ can have a large and positive performance. 

Our approach to detect chances consists on a real-time monitoring of the 
variables of the application domain. This detection requires some knowledge: 

1. Information on the domain. This is, domain knowledge. This domain 
knowledge describes all the relevant aspects related to the variables and 
the indicators. This is, how the variables are related one with another, 
and how they influence the indicators. This is to compute the indicators 
from the variables. 

2. Information on the actions the system can perform to influence a particu- 
lar variable in the environment. This is, a description of the capabilities 
of the system so that only relevant chances (the ones that can be of use) 
are detected. 

It is clear that the more accurate the model is, the most chances can 
be discovered by the system. The need of domain knowledge and that of 
the actions and their outcome makes this architecture analogous to the one 
in model based systems [68.2]. There, instead of modeling actions, possibles 
failures are modeled. 

According to this, a Fuzzy Knowledge Based System with a chance di- 
scovery model would follow the architecture given in Figure 68.1. Several 
monitoring elements, or agents, monitor the variables in such a way that at 
each time period it is checked whether an alteration of the variable value 
can cause an increase of the performance. Each monitoring element would 
use the domain model (DM) to compute the values of the indicators (il,z2) 
and the performance (Perf) from the variables (wl, v2, v5, v4, v5). Using this 
information and the model of the behavior of the variable in relation to the 
available actions for the system (fv2), the monitoring element would decide 
if at the present time there is a chance for improving the performance. If it 
is so, the adequate action will be applied. 



68.4 Conclusions 

In this work we have reviewed fuzzy knowledge based systems for complex 
systems. We have described an architecture for level-two fuzzy systems that 

^ We consider here only radical modifications of the Knowledge Based System (e.g. 
addition or suppression of rules) and not only minor changes 
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needs intensive domain knowledge to detect chances for improving the per- 
formance of the system. On the one hand, the need to detect the chances 
makes this application suitable for Chance Discovery. On the other hand, 
the need for intensive domain knowledge (knowledge that relates state va- 
riables and indicators and that points out which and when actions have to 
be undertaken) relate this approach with model based reasoning [68.2]. Fur- 
ther work is needed to study the suitability of the approach presented here 
and its adequacy in level-two fuzzy control. Also, from the point of view of 
chance discovery, further work is needed to clearly indentify the differences 
and coincidences with other artificial intelligence techniques as model based 
reasoning. 
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Research on knowledge discovery has attracted many people in recent years, 
and there is no doubt about its importance. Many benchmark data sets have 
been provided to the recent discovery challenge meetings in KDD, PKDD, 
PAKDD and JSAI conferences. Many researchers have tackled these data sets 
and shown their results. These efforts have indicated high potential of various 
knowledge discovery approaches. However, the diversity of the approaches 
and the data sets is so large that the significance of their resultant knowledge 
has not been extensively evaluated by the close collaborations among the 
data analysts and the domain experts. 

The aim of this workshop was to tackle a set of data with the close colla- 
borations among many analysts and an expert in the data domain and to 
evaluate the possibility of discovering significant knowledge in such an in- 
tegrated knowledge discovery process. This data set for the challenge was 
provided by a medical doctor. Prof. Shusaku Tsumoto (Shimane Medical 
University), who is the domain expert on the meningoencephalitis diagnosis 
and on the supervisory board of this workshop. The data set was obtained 
from the meningoencephalitis diagnosis activity in a hospital. This data set 
has been selected by considering the availability of the collaborative domain 
experts and the applicability of the various approaches to the data. 

This workshop strongly encouraged that the participants perform the fol- 
lowing tasks under the collaborations with the domain expert. 

1. Presentation of discovery approaches by analyst participants 

2. Presentation of preliminary results by analyst participants 

3. Presentation of evaluation and comments on the preliminary results by 
domain experts 

4. Presentation of the final results by analyst participants in the workshop 
meeting while accounting the above evaluation and comments given by 
the domain expert. 

5. Presentation of the final evaluation and comments by domain experts in 
the workshop meeting 

The steps from 1 to 3 were conducted by the direct and/or electronic dis- 
cussions before the workshop meeting. The steps 2 and 3 were repeated if 
further analyses are required. In step 5, the evaluation was made in terms of 
the significance of the discovered knowledge in the data domain. 

I believe that this kind of KDD challenge closely collaborated among 
participants and a domain expert had never been planed. Also, I believe that 
all participants had stimulative and fruitful experiences and discussions, and 
had a pleasant stay at JKDDOl. 
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Here is presented CAMLET that is a platform for automatic composition of 
inductive applications using ontologies that specify inductive learning me- 
thods. CAMLET constructs inductive applications using process and object 
ontologies. We have applied CAMLET to a meningoencephalitis dataset and 
evaluated CAMLET. The experimental results have shown us that it supports 
a human expert in discovering knowledge interesting to him. 



70.1 Introduction 

During the last twenty years, many inductive learning systems, such as IDS 
[70.3], Classifier Systems [70.1] and data mining systems, have been develo- 
ped, exploiting many inductive learning algorithms. As a result, end-users 
of inductive applications are faced with a major problem: model selection, 
i.e., selecting the best model to a given data set. Conventionally, this pro- 
blem is resolved by trial-and-error or heuristics such as selection-table for 
ML algorithms. This solution sometimes takes much time. So automatic and 
systematic guidance for constructing inductive applications is really required. 

From the above background, it is the time to decompose inductive learning 
algorithms and organize inductive learning methods (ILMs) for reconstruc- 
ting inductive learning systems. Given such ILMs, we may construct a new 
inductive application that works well to a given data set by re-interconnecting 
ILMs. The issue is to meta-learn an inductive application that works well on 
a given data set. Thus this paper focuses on specifying ILMs into an ontology 
for learning processes (called a process ontology here) and also an object on- 
tology for objects manipulated by learning processes. After constructing these 
two ontologies, we design a computer aided machine (inductive) learning en- 
vironment called CAMLET and evaluates the competence of CAMLET using 
several case studies from the database on meningoencephalitis with human 
expert’s evaluation. 
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70.2 Ontologies for Inductive Learning 

Considerable time and efforts have been devoted to analyzing the following 
popular inductive learning systems: Version Space [70.2], AQ15, IDS [70.3], 
C4.5 [70.4], Classifier Systems [70.1], Back Propagation Neural Networks, 
Bagged C4.5 and Boosted C4.5 [70.5]. The analysis results first came up with 
just unstructured documents to articulate which inductive learning proces- 
ses are in the above popular inductive learning systems. Sometimes it was a 
hard issue to decide a proper grain size of inductive learning processes. In 
this analysis, we did it under the condition of that the inputs and outputs 
of inductive learning processes are data sets or rule sets. When just a datum 
or rule is input or output of processes, they were too fine to be processes. 
An ontology is an explicit specification of a conceptualization. Here in this 
paper, a process ontology is an explicit specification of a conceptualization 
about inductive learning processes and an object ontology is about objects 
manipulated by them. In structuring many inductive learning processes into 
a process ontology, we got the following sub-groups in which similar inductive 
learning processes come together at the above-mentioned grain size: “gene- 
rating training and validation sets”, “generating a rule set”, “estimate data 
and rule sets”, “modifying a training data set” and “modifying a rule set”, 
with the top-level control structure as shown in Figure 70.1. 
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Fig. 70.1. Top-level 
Control Structure 



In order to specify the conceptual hierarchy of a process ontology, it is 
important to identify how to branch down processes. Because the upper part 
is related with general processes and the lower part with specific processes, it 
is necessary to set up different ways to branch the hierarchy down, depending 
on the levels of hierarchy. 

In specifying the lower part of the hierarchy, the above abstract compo- 
nent has been divided down using characteristics specific to each. For example 
“generating a rule set” has been divided into “(generating a rule set) depen- 
dent on training sets” and “(generating a rule set) independent of training 
sets” from the point of the dependency on training sets. Thus we have con- 
structed the conceptual hierarchy of the process ontology, as shown in Figure 
70.2. In Figure 70.2, leaf nodes correspond to the library of executable pro- 
gram codes that have been written in C, where “a void validation set” denotes 
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Fig. 70.2. Hierarchy of Process Ontology 



that it does not distribute learning set into training/ validation sets and that 
a learning system uses training set instead of validation set when it estimate 
a rule set at the learning stage, “window strategy” denotes that it refines a 
training set using extra- validation set which is out of character with existing 
rules. 



70.3 Basic Design of CAMLET 

Figure 70.3 shows the basic activities for knowledge systems construction 
using problem solving methods (PSMs) [70.6]. In this section, we apply the 
basic activities to constructing inductive applications using process and ob- 
ject ontologies. 
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Fig. 70.3. Basic Activities 



The construction activity constructs an initial specification for an induc- 
tive application. CAMLET selects a top-level control structure for an in- 
ductive learning system by selecting any path from “start” to “end” in Fi- 
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gure 70.1. Afterwards CAMLET retrieves the leaf-level processes subsumed 
in the selected top-level processes, checking the interconnection from the roles 
of pre-process and post-process from the selected leaf-level processes. Thus 
CAMLET constructs an initial specification for an inductive application, de- 
scribed by leaf-level processes in process ontology. 

The instantiation activity fills in input and output roles of leaf-level pro- 
cesses from the initial specification, using data types from a given data set. 
The values of other roles, such as reference, pre-process and post-process, 
have not been instantiated but come directly from process schemes. Thus 
an instantiated specification comes up. Additionally, the leaf-level processes 
have been filled in the process-list roles of the objects identified by the data 
types. 

The compilation activity transforms the instantiated specification into 
executable codes using a library for ILMs. When the process is connected 
to another process at implementation details, the specification for I/O data 
types must be unified. To do so, this activity has such a data conversion 
facility that converts a decision tree into classifier. 

The test activity tests if the executable codes for the instantiated spe- 
cification performs well, checking the requirement (accuracy) from the user. 
The estimation will come up to do a refinement activity efficiently. 

Figure 70.4 summarizes the above-mentioned activities. A user gives a 
learning set and a goal of accuracy to CAMLET. CAMLET constructs the 
specification for an inductive application, using process and object ontologies. 
When the specification does not go well, it is refined into another one with 
better performance by crossover of control structures, random generation and 
replacement of system components. To be more specific, in the case of a sy- 
stem’s performance being higher than <j(= 0.7* goal accuracy), CAMLET 
executes the replacement of system components. If not so, in the case of that 
system population size is equal or larger than some threshold {N > r = 4), 
CAMLET executes crossover of control structures, otherwise, executes ran- 
dom generation. All the system refined by three strategies get into a system 
population. As a result, CAMLET may (or may not) generate an inductive 
application that satisfies the user’s target accuracy. When it performs well, 
the inductive application can learn a set of rules that work well to the given 
learning set. 



70.4 A Case Study of Knowledge Discovery Support 
Using a Meningoencephalitis Dataset 

We apply CAMLET to a meningoencephalitis dataset in order to evaluate 
how much CAMLET supports a human expert in discovering interesting kno- 
wledge. The dataset consists of 140 cases and all the cases are described by 
38 attributes, including present and past history, laboratory examinations, fi- 
nal diagnosis, therapy, clinical courses and final status after the therapy. The 
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Fig. 70.4. An Overview of 
CAMLET 



important issues for analyzing this dataset comes as follows: to find factors 
important for diagnosis (DIAG and Diag2), ones for detection of bacteria 
or virus (CULT_FIND and CULTURE) and ones for predicting prognosis 
(C_COURSE and COURSE). 

70.4.1 Learning Rules from the View of Precision 

CAMLET has been applied to a meningoencephalitis dataset and a medical 
expert (Prof. Shusaku Tsumoto from Shimane Medical University) have given 
comments on the rules learned by the inductive applications constructed by 
CAMLET. Table. 70.1 shows us the results of six case studies as mentioned 
above. This table includes the following items: the case study identification 
(the first row), the rough specifications of inductive applications constructed 
by CAMLET (the second raw), the best precision of learned rule set (the 
third row), the number of learned rules judged ordinary (less interesting) 
(the fourth row), the number of learned rules judged more interesting (the 
fifth row), the number of learned rules judged difficult to understand (the 
sixth row) and the number of total learned rules (the last row) . 

Looking at the specification structures of the inductive applications con- 
structed by CAMLET, although decision tree learning methods, such as IDS 
and C4.5, always come up over all the cases, the whole control structures dif- 
fer at every specification in that ones have bagging and others have boosting, 
and ones have backward control structures and others not, and so on. Thus 
CAMLET seems to adapt inductive applications proper to the meningoen- 
cephalitis dataset. However, CAMLET cannot support a human expert in 
discovering interesting knowledge efficiently. We need to introduce heuristics 
so that the number of interesting rules is larger than that of ordinary rules 
over all the case studies. 
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Table 70.1. Experimental Results for Knowledge Discovery 



case 


rough 

specification 


precision 


# of 0 


# of i 


#oil 


total 


DIAG 


Boosting -I- 
ID3 -t CS 


92.1 


38 


15 


0 


53 


Diag2 


Boosting -I- 
ID3 


100.0 


27 


8 


0 


35 


CULTURE 


Bagging -|- 
ID3 (prune) 


80.0 


48 


7 


2 


57 


CULT_FIND 


ID3 


92.1 


18 


3 


3 


24 


C.COURSE 


Bagging -|- 
C4.5 (prune) 


85.0 


19 


3 


1 


23 


COURSE 


Bagging -|- 
C4.5 (prune) 


88.6 


39 


12 


3 


54 



0 means “ordinary rules” . 

1 means “interesting rules” . 

I means “less understandable rules”. 



70.4.2 Learning Rules from the View of Specificity 

We often bring a default hierarchy in genetic-based machine learning systems 
such as classifier systems, organizing a set of rules into the hierarchy structure 
from general rules down to specific rules. On one hand, although general rules 
cover many instances, they look like the rules with less interest or surprise to 
human experts. On the other hand, although specific rules cover just a few 
of instances, they could be the rules with more interest or surprise to them. 
So, in order to support a human expert in finding out interesting rules, we 
apply the default hierarchy to rules learned by the inductive applications con- 
structed by CAMLET. This default hierarchy consists of a set of paths from 
general rules down to specific rules that do not share the same conclusion. 

The medical expert gives some comments on four paths from general rules 
to specific rules in a learned default hierarchy. The first path starts with a 
general rule on CULT _FIND as follow: 

LOC-DAT = + 

A 122.0 < CSF.CELL7 CULT.FIND = T 
precision : 78.5% (11/14) 
recall : 33.3% (11/33) 

This general rule means that if a patient has loss of consciousness {LOG) and 
cell count in cerebulospinal fluid seven days after the treatment {CSFJJELL7) 
is more than 122.0, then his/her bacteria or virus is found {CULT^FIND). 
This medical expert gives the following comment on this rule: it is all right 
but not interesting to me. 

Getting down to the path and adding the condition of SEX = M, the 
following specific rule comes up: 
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LOC-DAT = + 

A 122.0 < CSF^CELLl 

A SEX = M CULT^FIND = F 
precision : 100.0% (3/ 3) 
recall : 2.8% (3/107) 

This specific rule means that if a patient is a man (underlining) in addi- 
tion to the previous condition, then his/her bacteria or virus is not found. 
The medical expert gives the following comment on the rule: it seems to be 
unexpected and interesting to me. 

The second path starts with a general rule on CU LT _F I N D as follow: 

^ CULT^FIND = F 
precision : 76.4% (107/140) 

recall : 100.0% (107/107) 

This general rule means that bacteria or virus is not found from a patient 
without any conditions in the ratio of 76.4% of cases. The medical expert 
gives the following comment on this rule: it it all right but not interesting to 
me. 

Getting down to the path and adding the condition of SEIZURE <1.0, 
the following specific rule comes up: 

SEIZURE < 1.0 ^ CULT.EIND = T 
precision : 96.9% (32/33) 
recall : 96.9% (32/33) 

The medical expert gives the following comment on this rule: it seems to be 
unexpected that if convulsion or epilepsy (SEIZURE) is not observed, then 
it comes to T. 

The third path starts with a general rule on COURSE (Grouped) as fol- 
low: 

42.0 < CSF.GLU COURSE(Grouped) = n 
precision : 86.2% (100/116) 
recall : 85.4% (100/117) 

This general rule means that if cell count in Glucose (CSE^GLU) is above 
42.0, then clinical course at discharge (COURSE (Grouped)) has no sym- 
ptoms. The medical expert gives the following comment on this rule: it is all 
right but not interesting to me. 

Getting down to the path and adding the condition of 15.0 < GCS, the 
following specific rule comes up: 

42.0 < CSF.GLU 

A 15.0 < GCS COURSE(Grouped) = p 
precision : 100.0% (16/16) 

recall : 69.5% (16/23) 

This specific rule means that if Glasgow Goma Scale (GCS), which is a score 
to evaluate the degree of less of consciousness, is more 15.0, then conclusion 
of a rule is reversed. The medical expert gives the following comment on this 
rule: it seems to be unclear but open question to me. 

The last path starts with a general rule on COURSE (Grouped) as follow: 
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LOG < 6.0 ^ COURSE{Grouped) = n 
precision : 85.8% (115/134) 
recall : 98.2% (115/117) 

This general rule means if a patient has loss of consciousness (LOG) came 
to the hospital within six days after LOC was observed, then clinical course 
at discharge {COU RSE {Grouped)) has no symptoms. The medical expert 
gives the following comment on this rule: it is all right but not interesting to 
me. 

Getting down to the path and adding the condition of SEIZURE <1.0, 
the following specific rule comes up: 

LOG < 6.0 

A SEIZURE < 1.0 ^ COU RSEiGrouped) = p 
precision : 94.7% (18/19) 
recall : 78.2% (18/23) 

The medical expert gives the following comment on this rule: a combination 
of conditions seems to be unexpected but open question to me. 

Thus some rules learned by the inductive applications constructed by 
GAMLET come up with medical expert’s unexpectedness and interestingness. 



70.5 Conclusions and Future Work 

In the case studies of knowledge discovery support using a meningoencepha- 
litis dataset, we have evaluated GAMLET from tow points of precision and 
specificity. Especially, in the latter point using a default hierarchy, some lear- 
ned rules are interesting to a medical expert. Although we get some learned 
rules that a interesting to a medical expert, too specific rules come up with 
much less coverage and no interest to him. We will extend the view to evaluate 
learned rules in order to support human experts in discovery interesting. 

Acknowledgement. We have many thanks to Dr. Shusaku Tsumoto who 
has given us the dataset on meningoencephalitis with his evaluation on the 
rules generated by our environment. 
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71.1 Introduction 

The meningitis dataset has been used for extracting meningitis knowledge by 
learning and mining methods. This paper reports the result of extracting knowl- 
edge from this dataset by a novel learning method called LUPC that integrates 
separate-and-conquer rule induction with association rule mining. We first briefly 
introduce the basic ideas of LUPC then describe experiments, extracted knowl- 
edge and the result evaluation. The extracted knowledge is concerned with factors 
important for diagnosis (DIAG and DIAG2), for detection of bacteria or virus 
(CULT_FIND and CULTURE) and for predicting prognosis (C_COURSE and 
COURSE). 



71.2 LUPC: Learning Unbalanced Positive Class 

Consider the rule induction problem where we focus on learning a minority target 
class seen as the positive class C*, denoted by Pos, and all other classes as the nega- 
tive class C~, denoted by Neg, i.e., IPo^l « \Neg\. Denote by cov(R) the set of in- 
stances covered by a rule R that is divided into two subsets of covered instances in 
Pos and Neg, denoted by cov(R) = cov^(R) u cov” (R). Our task is to find a set of 
predictive and descriptive rules for C^, denoted by Rh- = {R*,, R*j, . . ., R*^} so that Pos 
c cov(R^j) u cov(R* 2 ) tJ. . . U cov(R*^) and the discovered rules are “best” in terms of 
high sensitivity as well positive predictive value, and low false positive rate. Given 
thresholds a and (3 for accuracy and coverage ratio, a rule R is a(3-strong if acc(R) > 
a and Icovh-(R)I/IDI > (3. Table 1 presents the scheme of algorithm LUPC for solv- 
ing effectively the above problem. There are three essential features of LUPC that 
make it possible to learn efficiently minority classes in unbalanced datasets. Firstly, it 
carries out a search biasing alternatively on accuracy and cover ratio with adaptive 
thresholds. Secondly, it focuses on doing separate-send-conquer induction in the tar 
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get class with exploitation of the unbalanced property of datasets that allows trying 
the beam search with a large beam search parameter and one-sided selection. The 
following property shows the necessary constraint on cov“(R) for a rule R to be a(3- 
strong in terms of cov*(R) and the accuracy threshold. It will be used to reduce time 
of scanning the large Neg in generating and selecting candidate rules for C^: given a, 
a rule R is not ap-strong for any arbitrary p if cov“(R) > ((l-a)/a)xcov*(R). Thirdly, 
LUPC integrates pre-pruning and post- pruning in a way that can avoid over-pruning. 



Table 1. The scheme of algorithm LUPC 



Learn-positive-rule(Pos, Neg, minacc, mincov) 

1. RuleSet = 0 

2. a, P Initialize(Pos, Neg, minacc, mincov) 

3. while (Pos ^ 0 and (a, P) ^ (minacc, mincov)) 

4. NewRule BestRule(Pos, Neg,? >) 

5. if (NewRule ^ 0) 

6. Pos ^ Pos \ Cover''(NewRule) 

7. RuleSet <r- RuleSet u NewRule 

8. else Reduce(a, P) 

9. RuleSet PostProcess(RuleSet) 



10. retum(RuleSet) 

Procedure BestRulePos, Neg, a, P) 

1 1 . CandidateRuleSet = 0 

12. AttributeValuePairs((Pos, Neg, a, P) 

13. while StopCondition(Pos, Neg, a, P) 

14. CandidateRules(Pos, Neg, a, P) 

15. BestRule <— First CandidateRule in 

CandidateRuleSet 

16. retum(BestRule) 



71.3 Finding Rules from Meningitis Data 

We use two methods for discretizing numerical attributes in the meningitis data: 
entropy-based and rough set-based methods. The entropy-based method often 
yields few intervals of values, and ignores many attributes (15 out of 38 attrib- 
utes). The rough set-based method divides continuous attributes into more inter- 
vals of values and do not ignore any attributes. From the discretized dataset we 
created six derived datasets with the corresponding class attribute is from DIAG, 
DIAG2, CULT_FIND, CULTURE, C_COURSE and COURSE. We run LUPC on 
each of these datasets on two modes: learning one target class and learning all 
classes. Experiments have been done with fixed default parameters for finding 
rules: 95% for minimum accuracy of a rule, 2 cases are minimum cover of a rule, 
100 and 30 are numbers of candidate attribute-value pairs and rules, respectively. 
Different rules were extracted and they are synthesized in nearly 80 tables in the 
Excel format according to the derived datasets and learning modes, for example: 
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IF LOC = [*-!) and 

ONSET = ACUTE and 
CSF_CELL = [1505-*) and 
CELL_POLY = [431-*) 

THEN class = BACTERIA [accuracy = 1.00 (12/12); cover = 0.086] 



Based on synthesized tables of discovered rules, we have provided the domain 
experts a number of observations and analysis that are commonly concerned with 
the most frequent attributes in each class, the significant attributes or attribute- 
value pairs, the significant co-occurred attribute-values pairs, the strong rules with 
particularly large coverage if available, and rules that may be exceptional. 




Fig. 1. Finding meningitis knowledge with LUPC 
Factors Important for Meningitis Diagnosis DIAG and DIAG2 

From discovered rules for DIAG we observed that: 

• most frequent attributes: Cell_Poly, Loc_Dat, Egg_Focus, Focal, Ct_Find. 

• significant attributes or attribute-value pairs: 

- “CelLPoly > 220.5” for BACTE(E) and BACTERIA, 

- “CelLPoly < 220.5” for VIRUS and VIRUS(E), 

- “Egg_Focus = H-” for VIRUS(E), 

“Ct_find = abnormal” for ABSCESS. 

• significant co-occurred attribute-values pairs: 

- “CelLPoly < 220.5” AND “Egg_Focus = -” for VIRUS, 

- “CelLPoly < 220.5” AND “Focal = H-” for VIRUS(E). 

And from discovered rule for DIAG2: 

• most frequent attributes: Focal, CelLPoly, Loc_Data, Egg_Focus, Ct_Find. 

• significant or discriminant attributes or attribute-value pairs are reconfirmed 

- “CLfind = abnormal” for ABSCESS, 
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- “CelLPoly geq 220.5” for BACTE(E) and BACTERIA, 

- “CelLPoly < 220.5” for VIRUS and VIRUS(E). 

• significant co-occurred attribute-values pairs: reconfirmed the above conclu- 
sions and some new as “CelLPoly > 220.5” AND “Onset = Acute” AND “Loc 
= -1.5” for BACTERIA. 

• rules with large coverage: rules for VIRUS 

• rules that may be special or typical: rule 1 for ABSCESS, rule 2 for 
BACTERIA. 

A general observation is there are big groups of VIRUS cases that share com- 
mon symptoms (VIRUS rules with bigger coverage but not very high accuracy) 
while the rules for BACTERIA are with relatively smaller coverage but higher ac- 
curacy. The attribute “ONSET” has high frequency but seems not significant in 
distinguishing diseases. 

Factors for Predicting Prognosis C_COURSE and COURSE 

From discovered rules for C_COURSE we observed that: 

• most frequent attributes: Lasegue, Focal, Loc_Dat, Onset Ct_Find. 

• significant or discriminant attributes or attribute- value pairs: 

- for class “dead”: Locdat = -I-”, “Egg_wave = abnormal”, 

- for class “negative”: “Onset = Acute”, “Lasegue = 0”, “Focal = -”, 
“CelLMono > 10”. 

• significant co-occurred attribute-values pairs: 

- “CelLMono < 10” AND “Locdat = -I-” for class “dead”, 

- “Egg_wave = abnormal” AND “Locdat = H-” for class “dead”, 

- “Kernig = 0” AND “Focal = -” AND “Crp < 4.8” for class “negative”, 

- “Kernig = 0” AND “Focal = -” AND “CsLCell in (30.5-1040)” for class 
“negative”. 

• rules with large coverage: rules from 5 to 17 for class “negative”. 

• rules that may be special or typical: all rules for class “dead”, rule 23 for class 
“negative”. 

And form rules for COURSE: 

• most frequent attributes: Lasegue, Focal, Locdat. 

• significant or discriminant attributes or attribute- value pairs: 

- “Focal = -” in class “n” and “Focal = +” in class “p”, 

- “Locdat = -” in class “n” and “Locdat = -I-” in class “p”, 

- “Egg_wave = normal” in class “n”, “Egg_wave = abnormal” in “p”, 

- “CelLMono > 10” in class “n” and “CelLMono < 10” in class “p” 

- “Lasegue = 0” is popular in class “n”. 

• significant co-occurred attribute-values pairs: 

- “Lasegue = 0” AND “Focal = -” AND “Crp < 4.8” in class “n”, 

- “Lasegue = 0” AND “CelLMono > 1.0” in class “n”, 
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- “Local = +” AND “Focal = +” AND “Egg_wave = abnormal” in “p”, 

- “Locdat = +” AND “Cell_Mono < 1.0” in class “p”. 

• rules with large coverage: most rules for class “n”. 

Two classes “n” and “p” can be distinguished by obtained rules. 

Detection of Bacteria or Virus: CULTURE and CULT_FIND 

From discovered rules for CULTURE we observed that: 

• most frequent attributes: Loc_Dat, Crp, Ct_Eind, Csf_Cell. 

• significant or discriminant attributes or attribute- value pairs: 

- “Locdat = “Crp < 4.8”, “Cell_Mono > 10” are pupolar in class 

- “Egg_wave = abnormal”, Ct_find = abnormal” are popular in classes “he 
pes” and “strepto” 

• significant co-occurred attribute-values pairs: 

- “Locdat = -” AND “Crp < 4.8” AND “Cell_Mono > 10” in class 

- “Egg_wave = abnormal” AND “Ct_find = abnormal” OR “Egg_wave = ab- 
normal” AND “Risk = sinutisis” in class “strepto”. 

• rules with large coverage: most rules for class 

And from rules for CULTFIND: 

• most frequent attributes: Loc_Dat, Egg_Focus, Csf_Cell, Cf_Eind, Risk. 

• significant or discriminant attributes or attribute- value pairs: 

- “Locdat = -” is popular in “E” while “Locdat = H-” is popular in “T”, 

- “Crp < 4.8” is popular in “E” while “Crp > 4.8” is popular in “T”, 

- “Cell_Mono > 10” is popular in “E” while “Cell_Mono < 10” is popular in 

“T”, 

- “Ct_find = normal” is popular in “E” while “Ct_find = abnormal” is popular 

in “T”, 

- “Risk = p” is popular in “E” while “Risk = n” OR “Risk = sinusitis” are 

popular in “T”. 

• significant co-occurred attribute-values pairs: 

- “Onset = acute” AND “Crp < 4.8” in “E”, 

- “LocDat = H-” AND “Risk = n” in “T”. 



71.4 Conclusion 

We have briefly introduced method LUPC to learn the target positive class from 
large unbalanced datasets. The essence of LUPC is its combination of separate- 
and-conquer rule induction with association rules mining, as well the use of dy- 
namic multiple thresholds and the property of unbalanced datasets. We apply 
LUPC to investigate the meningitis dataset. Many rules with high accuracy have 
been found for factors important for diagnosis (DIAG and DIAG2), for detection 
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of bacteria or virus (CULT_FIND and CULTURE) and for predicting prognosis 
(C_COURSE and COURSE). Appendixes 1 and 2 present a summarization of 
rules extracted for DIAG. 
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Appendix 1 



LUPC’s rule learning includes two modes: learning all classes and learning only one target 
class. These four tables show the numbers of cases which coverd rules from “DIAG2” and 
“DIAG” obtained by LUPC with the condition of: (1) learning mode: all classes, (2) Mini- 
mum accuracy: 95%, (3) Minimum cover: 2 cases, (4) Number candidate conditions: 100, 
(5) Number candidate rules: 30. Table 2 is the result on “DIAG2” discretized by entropy. 
Table 3 is the result on “DIAG2” discretized hy Rosetta. Likewise, Table 4 and Table 5 are 
the results on “DIAG2” discretized by entropy and Rosetta. 
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Table 5. Rules from "diag" discretized by Rosetta 



Table 2. Rules from "diagl" discretized by entropy 



class 


ID 


(a) 


(b) 


(c) 


(b) 


BACTERIA 


1 


1.00 


32 


32 


0.23 


BACTERIA 


2 


1.00 


13 


13 


0.09 


BACTERIA 


3 


1.00 


12 


12 


0.09 


BACTERIA 


4 


1.00 


11 


11 


0.08 


BACTERIA 


5 


1.00 


8 


8 


0.06 


VIRUS 


6 


0.95 


100 


95 


0.72 


VIRUS 


7 


0.97 


88 


85 


0.63 


VIRUS 


8 


0.99 


83 


82 


0.60 


VIRUS 


7 


0.97 


88 


85 


0.63 


VIRUS 


8 


0.99 


83 


82 


0.60 



(a) : accuracy 

(b) ; number of covered cases 

(c) : number of correct cases 

(d) : coverage of the rule 



Table 4. Rules from "diag” discretized by entropy! 



Table 3. Rules from "diagl" 

discretized by Rosetta 



class 


ID 


(a) 


(b) 


(c) 


(d) 


BACTERIA 


1 


1.00 


27 


27 


0.19 


BACTERIA 


2 


1.00 


15 


15 


0.11 


BACTERIA 


3 


1.00 


14 


14 


o 

o 


BACTERIA 


4 


1.00 


12 


12 


0.09 


BACTERIA 


5 


1.00 


9 


9 


0.06 


BACTERIA 


6 


1.00 


9 


9 


0.06 


BACTERIA 


7 


1.00 


9 


9 


0.06 


BACTERIA 


8 


1.00 


5 


5 


o 

b 


VIRUS 


9 


0.96 


47 


45 


0.34 


VIRUS 


10 


0.96 


45 


43 


0.32 


VIRUS 


11 


0.96 


45 


43 


0.32 


VIRUS 


12 


0.96 


45 


43 


0.32 


VIRUS 


13 


0.95 


44 


42 


0.31 


VIRUS 


14 


0.95 


43 


41 


0.31 


VIRUS 


15 


0.95 


42 


40 


0.30 


VIRUS 


16 


0.95 


42 


40 


0.30 


VIRUS 


17 


0.95 


40 


38 


0.29 


VIRUS 


18 


0.97 


32 


31 


0.23 


VIRUS 


19 


0.97 


32 


31 


0.23 


VIRUS 


20 


0.97 


29 


28 


0.21 


VIRUS 


21 


0.96 


28 


27 


0.20 


VIRUS 


22 


0.96 


27 


26 


0.19 


VIRUS 


23 


0.96 


26 


25 


0.19 


VIRUS 


24 


1.00 


23 


23 


0.16 


VIRUS 


25 


0.96 


23 


22 


0.16 


VIRUS 


26 


0.96 


23 


22 


0.16 



class 


ID 


(a) 


(b) 


(c) 


(d) 


ABSCESS 


1 


1.00 


6 


6 


0.04 


ABSCESS 


2 


1.00 


3 


3 


o 

b 

ro 


ABSCESS 


3 


1.00 


2 


2 


0.01 


ABSCESS 


4 


1.00 


2 


2 


0.01 


BACTE(E) 


5 


1.00 


3 


3 


0.02 


BACTE(E) 


6 


1.00 


2 


2 


0.01 


BACTE(E) 


7 


1.00 


2 


2 


0.01 


BACTERIA 


8 


1.00 


11 


11 


0.08 


BACTERIA 


9 


1.00 


10 


10 


0.07 


BACTERIA 


10 


1.00 


8 


8 


0.06 


BACTERIA 


11 


1.00 


8 


8 


0.06 


BACTERIA 


12 


1.00 


8 


8 


0.06 


BACTERIA 


13 


1.00 


6 


6 


0.04 


BACTERIA 


14 


1.00 


5 


5 


0.04 


VIRUS 


15 


0.95 


61 


58 


0.44 


VIRUS 


16 


0.95 


60 


57 


0.43 


VIRUS 


17 


0.97 


58 


56 


0.41 


VIRUS 


18 


0.96 


54 


52 


0.39 


VIRUS 


20 


0.95 


43 


41 


0.31 


VIRUS 


19 


0.96 


51 


49 


0.36 


VIRUS(E) 


21 


1.00 


11 


11 


0.08 


VIRUS{E) 


22 


1.00 


11 


11 


0.08 


VIRUS(E) 


23 


1.00 


10 


10 


0.07 


VIRUS{E) 


24 


1.00 


9 


9 


o 

b 

cn 


VIRUS(E) 


25 


1.00 


9 


9 


o 

b 

cn 


VIRUS{E) 


26 


1.00 


8 


8 


0.06 


VIRUS(E) 


27 


1.00 


6 


6 


0.04 


VIRUS(E) 


28 


1.00 


7 


7 


0.05 



class 


ID 


(a) 


(b) 


(c) 


(d) 


ABSCESS 


1 


1.00 


6 


6 


0.04 


ABSCESS 


2 


1.00 


4 


4 


0.03 


ABSCESS 


3 


1.00 


4 


4 


0.03 


ABSCESS 


4 


1.00 


4 


4 


0.03 


BACTE(E) 


5 


1.00 


3 


3 


0.02 


BACTE(E) 


6 


1.00 


3 


3 


0.02 


BACTE(E) 


7 


1.00 


3 


3 


0.02 


BACTE(E) 


8 


1.00 


3 


3 


0.02 


BACTE(E) 


9 


1.00 


2 


2 


0.01 


BACTE(E) 


10 


1.00 


2 


2 


0.01 


BACTERIA 


11 


1.00 


12 


12 


0.09 


BACTERIA 


12 


1.00 


9 


9 


0.06 


BACTERIA 


13 


1.00 


8 


8 


0.06 


BACTERIA 


14 


1.00 


7 


7 


0.05 


BACTERIA 


15 


1.00 


7 


7 


0.05 


BACTERIA 


16 


1.00 


6 


6 


0.04 


BACTERIA 


17 


1.00 


6 


6 


0.04 


BACTERIA 


18 


1.00 


6 


6 


0.04 


BACTERIA 


19 


1.00 


5 


5 


0.04 


BACTERIA 


20 


1.00 


5 


5 


0.04 


BACTERIA 


21 


1.00 


4 


4 


0.03 


VIRUS 


22 


0.95 


22 


21 


0.16 


VIRUS 


23 


0.95 


21 


20 


0.15 


VIRUS 


24 


0.95 


21 


20 


0.15 


VIRUS 


25 


0.95 


21 


20 


0.15 


VIRUS 


26 


0.95 


20 


19 


0.14 


VIRUS 


27 


1.00 


18 


18 


0.13 


VIRUS 


28 


1.00 


15 


15 


0.11 


VIRUS 


29 


1.00 


15 


15 


0.11 


VIRUS 


30 


1.00 


14 


14 


0.10 


VIRUS 


31 


1.00 


14 


14 


0.10 


VIRUS 


32 


0.97 


33 


32 


0.24 


VIRUS{E) 


33 


1.00 


10 


10 


0.07 


VIRUS(E) 


34 


1.00 


10 


10 


0.07 


VIRUS(E) 


35 


1.00 


9 


9 


0.06 


VIRUS(E) 


36 


1.00 


9 


9 


0.06 


VIRUS{E) 


37 


1.00 


9 


9 


0.06 


VIRUS(E) 


38 


1.00 


7 


7 


0.05 


VIRUS{E) 


39 


1.00 


7 


7 


0.05 


VIRUS(E) 


40 


1.00 


7 


7 


0.05 


VIRUS{E) 


41 


1.00 


7 


7 


0.05 


VIRUS(E) 


42 


1.00 


7 


7 


0.05 


VIRUS(E) 


43 


1.00 


6 


6 


0.04 


VIRUS(E) 


44 


1.00 


6 


6 


0.04 


VIRUS(E) 


45 


1.00 


6 


6 


0.04 
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Appendix 2 



Rules from “diag” with Rosetta discretization for all classes 





Values of attributes contained In each rule 


class 


1 rule ID 


accuracy 


I # of covered cases 


1 # of corict cases 


coverage 


AGE 


1 SEX 1 


1 COLD 1 


1 HEADACHE | 


1 FEVER 1 


1 NAUSEA 1 


o 

o 


1 SEIZURE 1 


ONSET 




1 STIFF 1 


1 KFRNIG 1 


I 

I 


o 

CD 


— 

Z1 


< 

O 

D 


WBC 


CL 

O 




CT.FIND 


3AVAA 933 


1 EEG FOCUS 1 


0| 

o 


CelLPoly 


s 

o 


o 

CL| 

o 


3 

CD| 

O 


1 CULT FIND 1 


CULTURE 


0| 

o 


C_C0URSE 


1 COURSE 1 


ABSCESS 


1 


1.00 






0.04 






r-1) 












ACUTE 






















ab- 

rorma 






r-75) 




















ABSCESS 


2 


1.00 






0.03 






























* 




11550- 

■) 


























r-1) 






ABSCESS 


3 


1.00 






0.03 
















■-1) 












[TT 

:l 
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72. Basket Analysis on Meningitis Data 

Takayuki Ikeda, Takashi Washio, and Hiroshi Motoda 

Institute of Scientific and Industrial Research, Osaka University, 

8-1, Mihogaoka, Ibarakishi, Osaka, 567-0047, JAPAN 

Basket Analysis is the most representative approach in recent study of data 
mining. However, it cannot be directly applied to the data including numeric 
attributes. In this paper, we propose an algorithm and performance measures 
for the selection and the discretization of numeric attributes in the data 
preprocessing stage for the wider application of Basket Analysis, and the 
performance is evaluated through the application to the meningitis data. 



72.1 Introduction 

Basket Analysis is the most representative approach in the study of data 
mining[72.1], and has become to be widely used in the real world applicati- 
ons in recent years. Based on this background, we decided to apply Basket 
Analysis to the meningitis data given in this discovery challenge [72. 2]. Ho- 
wever, Basket Analysis has a drawback that it cannot handle data involving 
numeric information such as the meningitis data, because it is to mine the 
associations among discrete events in principle. Thus, the task to select nu- 
meric attributes having associations with other attributes and to discretize 
the values of the selected numeric attributes in the data must be introduced 
to the mining process. An approach is to embed the task into the mining 
algorithm. This approach is taken in the decision tree based mining such as 
C4.5[72.3]. The mining algorithm directly accepts the numeric data, selects 
attributes relevant to the class, and discretizes the values of the selected nu- 
meric attributes while developing the decision tree. However, this approach 
is not suitable for Basket Analysis since its algorithm does not include any 
process of intermediate estimation of value distributions for massive data. 

Accordingly, we followed another approach, which applies the selection 
and the discretization in the data preprocessing stage. One important issue 
in its development is that the selection of the numeric attributes must be 
performed while taking into account the dependency among the attributes. 
This is because the association is a representation of the strong dependency 
among the events characterized by the attributes. The second issue is that 
the points of the discretizations in the value ranges of the numeric attri- 
butes must be chosen to appropriately reflect the dependency of the data 
distribution among multiple attributes. The third issue is that the discre- 
tization must have appropriate granularity. If the granularity is too small, 
the excessive fragmentation of the dependent region reduces the number of 
data representing the association of the values in each fragmented region. 
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The forth issue is to establish an efficient algorithm for the selection and 
the discretization under massive data, though this issue is not crucial for the 
meningitis data since the number of the data is very limited. 

The objectives of this paper are as follows. 

(1) Development of an approach for the selection and the discretization of 
numeric attributes addressing the aforementioned four issues. 

(2) Application of the approach and Basket Analysis to the meningitis data, 
the evaluation of their performance and the discussion on the discovered 
knowledge. 



72.2 Method for Selection and Discretization 

72.2.1 Algorithm 

First, the algorithm to select and discretize the numeric attributes we deve- 
loped is described[72.4]. The entire flow chart of the algorithm is depicted 
in Fig. 72.1. Given a performance measure, this algorithm takes the greedy 
strategy to conduct the selection and discretization for large database in an 
efficient manner, and thus does not ensure to achieve the optimum selection 
and discretization. The detail of the performance measure will be described 
in the later subsection. 

Initially the minimum value in the value range of data for a numeric 
attribute is set to be a candidate threshold value. Applying this threshold 
for the discretization, its performance is evaluated, and it is compared with 
the performance of the former candidate threshold if it exists. When the 
performance of the newest candidate threshold is better, the threshold and 
the performance are recorded. Increasing the threshold value in some small 
amount, this search process is repeated until all candidate thresholds for every 
attribute have been evaluated. Once this repetition is finished, the attribute 
and its threshold value having the optimum performance is selected and used 
to discretize the data at the threshold of the attribute. After determining the 
threshold value of a numeric attribute, the search of another attribute and its 
threshold is repeated until the number of the threshold becomes to a given 
upper limit. The process of the selection and the discretization is applied only 
to the numeric attributes in the data, and the discretized attribute is merged 
with the original categorical attribute. As easily seen by the loop structure 
of the algorithm, this algorithm needs only the computational time in the 
order of 0{ND) where N and D are the number of data and the number 
of numeric attributes. Because of the linear order of the computational time 
in terms of the data size, this algorithm can process a large amount of data 
efficiently, and hence the issue of the efficiency described in the first section 
is addressed by this algorithm. 
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Fig. 72.1. Greedy algorithm for selection and discretization. 



72.2.2 Performance Measure 

The most representative performance measure is the information entropy eva- 
luated from the class distribution of data on each numeric attribute axis. This 
measure is used in the selection and the discretization scheme of C4.5[72.3]. 
However, this measure cannot take into account the dependency of the class 
distribution among multiple numeric attributes since the selection and the 
discretization is applied to each attribute individually. Accordingly, the per- 
formance measure based on the class distribution in each region space gene- 
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Numeric 
Attribute 1 



Fig. 72.2. Regions and class distribution. 



rated by the discretization is used in our work. The distribution Sij of the 
data having class j{= 1, 2) in a region i is depicted in Fig.72.2. In this figure, 
= 3 and |S'j^ 2 | = 2. Based on total distribution of |5'ij|s, the perfor- 
mance measure such as the information entropy of the entire discretization 
can be calculated. 

However, the information entropy based on the discretized region space 
does not suggest the appropriate number of the attributes and their thres- 
holds for the selection and the discretization. Therefore, it does not address 
the issue of the granularity described in the first section. This difficulty is sol- 
ved by introducing the well-known measure named AIC (Akaike’s Informa- 
tion Criterion) which represents Kullback information entropy [72. 5]. Given a 
discretization pattern HT, AIC under HT is evaluated by following. 

A/C(ifT) = -2 log 2a, (72.1) 

i=i j=i 

where M is the total number of the discretized regions containing some data, 
l^ij the number of the data in the i-th region, Ki the number of the classes 
appearing in the z-th region, \Sij \ the number of the data having the j-th class 
in the z-th region, and a the number of thresholds in HT . This measure can 
estimate the discretization having an appropriate granularity, which does not 
fragment the dependent regions among attributes under the assumption that 
rectangular parallelepipeds indicated in Fig.72.2 can asymptotically subsume 
each dependent region. However, this assumption does not always hold, since 
the shapes of the dependent regions are not limited to parallelepipeds. Hence 
we sought the other measure to address the granularity issue. 
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The principle of the performance measure proposed in this work is the 
Minimum Description Length (MDL) principle[72.6]. The description length 
used in this work, Length{HT), is the sum of the description length of a 
discretization pattern HT, i.e., code book length, and the description length 
of the class information of the given data under HT, i.e., coding length. The 
formula of Length{HT) depends on the coding method of the class infor- 
mation. When the combination of the classes appearing in each discretized 
region is coded, the formula becomes as follows. 



M Ki 

Length{HT) — — + 1) • ^092 

i=l(|SiMO) j = i 



|gj,d + l 

\Si\ + Ki 



+ 



-{a + 1 ) • loQ2 



a 1 
|T„| +2 



- (|Ti| - a -I- 1) • l 0 Q 2 



Tn\ — Ct + 1 1 
\T„\+2 J 



+ M ■ Ioq2(2^ - 1). 



(72.2) 



where |T„| is the number of candidate thresholds for the n-th attribute and 
K the total number of the classes. When the codes are assigned to all classes 
even if some classes do not appear in a discretized region, the combination of 
the classes does not have to be coded. Thus, we obtain the following formula. 



Length(HT) 



M K 

= - -I- 1) • *032 

i = l(|SjM0)2 = l 



\^i,j I + 1 

\Si\^K 



D I’ ^ I 

+ H ■ *°92 ^ - (|Tn| - a -I- 1) 



1 092 



\Tn\-a + 

|T„| -I- 2 



d- 



(72.3) 



MDL principle suggests that the selection and the discretization pattern 
which gives the minimum Length{HT) is the best in terms of the parsimo- 
nious description of the data. Eq.(72.1), Eq.(72.2) and Eq.(72.3) are applied 
to the algorithm of Fig. 72.1 as the performance measures respectively. 



72.3 Application 

The meningitis data provided in this JKDDOl Challenge contain 140 cases 
consisting of 21 numeric attributes and 13 categorical attributes. They re- 
present the contents of medical examination, inspection and treatment. The 
data also include two class attributes on the diagnosis result, DIAG and 
DIAG2. We applied our approach to the class DIAG which takes 6 values. 
The objectives of the mining analysis are as follows. 

(a) To obtain association rules which indicate the conditions of the class 
DIAG. 

(b) To obtain association rules describing the relations among multiple at- 
tributes. 
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(c) To compare the empirical characteristics of the three performance mea- 
sures. 

(d) To obtain the review on the mining results by a medical expert and to 
reflect the review comments to the further analyses and discussions. 

The procedure of the analysis consists of the following five stages. First, 
the aforementioned selection and discretization method is applied to the nu- 
meric attributes except CSF_CELL3 containing some missing values. Thus, 
the selection and the discretization are applied to the 20 numeric attributes. 
Second, the categorical attributes except THERAPY2 representing the tre- 
atment method, i.e., totally 12 attributes, are combined with the discretized 
attributes. THERAPY2 was removed because the treatment is a consequence 
of the diagnosis but not a condition. Third, each attribute and its value are 
combined together, and it is transformed into the form of an item. This is 
because the original data have a table format, but Basket Analysis in the 
later stage accepts only the data in an item transaction format. Forth, Bas- 
ket Analysis is applied to the data preprocessed in the former stages. Fifth, 
the association rules containing the class attribute DIAG in the head part 
are collected for the aforementioned object (a). The other rules are separa- 
tely collected for the object (b). Every performance measure of Eq.(72.1), 
Eq.(72.2) and Eq.(72.3) is applied to this mining process. 



72.4 Result and Expert’s Evaluation 

Table 72.1 shows the attributes and their threshold values derived through 
the selection and discretization process by each performance measure. AIC 
Eq.(72.1) selects and discretizes only a small number of attributes, but the 
selection of the attributes has some variety. MDL Eq.(72.2) selects and di- 
scretizes many attributes, but does not show much variety in the attribute 
selection. MDL Eq.(72.3) selects and discretizes attributes as many as MDL 
Eq.(72.2). This may be because of the similarity of the criterion formulae. 
However, the discretized attributes show much variety in MDL Eq.(72.3). 
The value of Eq.(72.2) has a high sensitivity to the number of the classes 
included in each region, because the combination of the classes appearing in 
each region is coded in the measure, and the number of the combination is 
exponential to the number of the classes. The sensitivity makes the class dis- 
tribution in each region to be dominated by a class, and reduces the chance 
to discretize a region including various classes where the regions dominated 
by a class are hardly obtained by the discretization. Accordingly, the region 
already dominated by a class has a tendency to be further selected and discre- 
tized on the attributes, which have been already used for the discretization. 

By applying the subsequent stages for the three performance measures, 
we obtained dozens of association rules. They are presented to the medi- 
cal expert who provided this data. The expert suggested that the attribute 
RISK(Grouped) should be removed from the data, since it is generated by 
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Table 72.1. Discretized attributes and threshold values 



- 


1 AIC Eq.(72.1) 


1 MDL Eq.(72.2) | 


1 MDL Eq. (72. 3) | 


m 


attributes 


thresholds 


attributes 


thresholds 


attributes 


thresholds 


1 


CelLPoly 


2210 


CelLPoly 


307.9 


CelLPoIy 


307.9 


2 


CelLMono 


12.0 


CSFXELL 


33.8 


CelLMono 


83.3 


3 


CRP 


3.1 


CelLPoly 


131.6 


FEVER 


7.0 


4 


CelLMono 


320.0 


SEIZURE 


1.86 


AGE 


46.2 


5 


HEADACHE 


3.0 


CelLPoly 


6935.2 


NAUSEA 


4.0 


6 


CSF_GLU 


55.0 


CSFXELL 


8468.4 


CSF_PRO 


44.2 


7 


BT 


37.0 


CelLPoly 


657.8 


CelLPoIy 


55.2 


8 


- 


- 


CelLPoly 


2115.7 


LOC 


0.0013 


9 


- 


- 


CSF.CELL 


2668.3 


HEADACHE 


5.0 


10 


- 


- 


CelLPoly 


55.2 


WBC 


6846.9 


11 


- 


- 


WBC 


19680.8 


BT 


37.8 


12 


- 


- 


CSFXELL 


84.7 


CSF_GLU 


49.0 


13 


- 


- 


CSF.GLU 


108.8 


CelLPoIy 


2.6 


14 


- 


- 


CelLPoly 


28.9 


CelLPoIy 


7.6 


15 


- 


- 


CelLMono 


117.8 


CelLPoIy 


15.9 



grouping the values of another attribute RISK, and shares some redundant 
information with RISK. After its removal, the mining process is repeated. 
The followings are the examples of the sets of association rules containing 
the class attribute DIAG in the head part derived by the second mining 
process under some minimum support and minimum confidence levels. 

AIC Eq.(72.1): Minimum Support=45% and Minimum Confidence=60% 

{[LOCdAT] : -} - = > {[CRP] : underA.l, [C<dlpoly] : unde.r‘221. [DiAG] : VIRUS) 

{[LOCuAT] : -} ==> {[Cellpoly] : undev221, [CTpIND] : normal, [DIAG] : VIRUS) 

{[LOCdAT] : -{ ==> {[Cellpoly] : undev221,[RISK] : n,[DIAC] : VIRUS] 

{[CM poly] : urider221, [RISK] : n) ==> {[CRP] : under?,.!, [DIAG] : VIRUS) 

{[Ce.llpoly] ; xnider221, [RISK] : n) = = > {[DIAG] : VIRUS, [LOCdAT] : -} 

{[Cellpoly] : 'undcr221. [RISK] : n) = = > {[DIAG] : VIRUS, [FOCAL] : -} 

{[C<dlpoly] : urider221, [RISK] : n) ==> {[DIAG] : VIRUS, [CcOURSE] : negative.} 

{[Cellpnly] : under22^) ==> {[CRP] : unde.r3.-\, [DIAG] : VIRUS) 

{[Cellpoly] ; undcr22l} ==> {[DIAC] : VIRUS, [LOCdAT] : -} 

{[Cellpoly] : under22l) ==> {[DIAG] : VIRUS, [FOCAL] : -) 

{[Cellpoly] : tmder22\) {[DIAG] : VIRUS. [CcOURSE] : negoMve) 

{[Cellpoly] : under221. [CcOURSE] : ncyotive) = = > {[DIAG] : VIRUS, [COU RSE{CroupcA)] : n) 
{[Cellpoly] : uuder221, [COU RSE(Grouped)] : n) ==> {[DIAG] : VIRUS. [CcOURSE] : negafAve) 

MDL Eq. (72.2): Minimum Support=45% and Minimum Confidence=60% 

{[LOCdAT] : - ) = = > 

{[DIAG] : VIRUS, [SEIZURE] : nnd.erl.8Ir, [W BC] : under V)m), [CSEcLU] : underlOS) 

{[FOCAL] ; -} ==> 

{[DIAG] : VIRUS, [SEIZURE] : underl.85, [WBG] : underlQGSO, [C SFc.LU] : imderlOS} 
{[CTplND] : n.onnal) ==> {[DIAG] : VIRUS, [SEIZURE] : nrirferl.Sr), [CS’.Cc-LC] : jmrfrr'inS} 
{[SEIZURE] : underl.So, [Ce.llMono] : overl7.7) => {[D7AG] : VIRUS, [CSFcLU] : underlOS} 
{[Cc^Cwono] : OUCV17.7, [ONSET] : ACUTE, [RISK] : n) ==> 

{[DIAG] : VIRUS, [SEIZURE] : u/N,dr.rl.85, [CS.FcLU] : 7mdr.rl08} 

{[CVJC\fO?io] : oi;erl7.7; [ONSE'T] : ==> 

{[DIAG] : VIRUS, [SEIZURE] : undcrl.S7>, [CSFcLU] : lincicrlOS, [/Z/SJT] : n} 

{[SEIZURE] : andf'.rl.85, : oi;erl7.7, [CSFcLU] : underlOS, [RISK] : n} ==> 

{[D7/\C] : [WSC] : under19680} 

{[S El ZU RE] : underl.So, [Ce/ZA-/ono] : ot;er'17.7, [CSFcLU] ; underlUS, [RISK] : n] ==> 

{[D74G] : VIRUS. [ONSET] : ACUTE) 
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{[SEIZU RE] : unde.rl.So. [CedlMono] : ove.rl7.7. [CSFgRE] '■ underlOS} =—> 

{[DTAC] : VIRUS, [VV'73C] : tt/tdcrl 96S0, [RISK] : n } 

{[SEIZU RE\ : underl.Sb, [CellMono] : otierlT.T, [CSFaLU] : underlOS} ==> 

{[DIAG] : VIRUS, [ONSET] : ACUTE} 

{[SEIZURE] : .85, [CSFgLU] : unrfc/'l OS, [OATSE'/'] : ACUTE, [RISK] : n} = = > 

{[D/^G] : y/HGS’} 

; ?t77.f/e7'l,85, [C5.FcLG] : u/ir/erlUS, : ACUTE] > {[DIAG] ; VIRUS] 

MDL Eq.(72.3): Minimum Support=45% and Minimum Confldence=60% 

{[LOC] : unde.rO.OaU} ==> {[DIAG] : VlliUS] 

{[LOCnA'r] ■ -} ==> {|Dr/1C] ; VIRUS, [C7>fjVD] ; normal] 

{[LOC'uAT] : -} ==> {[DIAG’] : VIRUS, [RISK] : n} 

{[FOCAL] : -> ==> {[DIAG] : VIRUS, [RISK] ■. tj} 

{[C7V/JVI3] ; normal] = = > {[DIAG] : V I RU S[LOC n AT] : -} 

{[CcOURSE] : negative] = = > {[DIAG] : VIRUS] 

The contents of the mined rule sets show the strong dependency on the 
performance measures. In case of MDL Eq.(72.2), many rules have mutually 
similar body parts and/or head parts which are derived from almost iden- 
tical frequent itemsets except the first three rules. The first three rules are 
also derived from mutually similar frequent itemsets. The reason why only 
similar frequent itemsets are derived is because the discretization of MDL 
Eq.(72.2) does not show much variety in the attribute selection as pointed 
out earlier. The case of AIC Eq.(72.1) also shows the similar tendency. Be- 
cause the number of the numeric attributes selected and discretized by this 
performance measure is small as shown in Table 72.1, the number of the 
frequent itemsets found in Basket Analysis becomes small. This effect also 
reduces the variety of frequent itemsets derived in Basket Analysis. On the 
other hand, the case of MDL Eq.(72.3) shows more variety of the combinati- 
ons of the items appearing within a small number of rules. This is because the 
variety and the number of the attributes selected and discretized by this per- 
formance measure were large as mentioned earlier. In addition, the number 
of itemsets included in each rule is smaller than the other cases. This is also 
due to the large variety and the large number of the discretized attributes. 
This property of the selection and the discretization increases the number of 
the discretized regions in the numeric attribute space, and the number of the 
data in a region is reduced. This effect makes the size of the frequent item- 
sets smaller. Almost identical tendency has been observed for the association 
rules describing the relations among multiple attributes excluding the class 
attribute. They are not indicated due to the space limitation. 



72.5 Conclusion 

The medical experts evaluated the rule sets derived by using MDL Eq.(72.3) 
contains more interesting rules than the other cases. He ordered the perfor- 
mance measures in terms of the ability to derive interesting rules as follows. 

MDLEq.{72.i) > MDLEq.{72.2) > AICEq.{72.1) (72.4) 
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This order matches with the order of the variety of itemset combinations 
in the association rules. The set of rules among various attributes and their 
thresholds suggests many potential mechanisms underlying the data. 

In conclusion, the performance measure for the discretization of the nu- 
meric attributes’ values strongly affects the results of Basket Analysis. The 
performance measure which selects variety of the attributes and many thres- 
hold values catches interesting relations among events for domain experts. 
This insight should be validated through the extensive analysis in the future. 
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Genetic programming (GP) usually has a wide search space and can use 
tree structure as its chromosome expression. So, GP may search for global 
optimum solution. But, in general, GP’s learning speed is not so fast. Apriori 
algorithm is one of algorithms for generation of association rules. It can be 
applied to large database. But, It is difficult to define its parameters without 
experience. We propose a rule discovery technique from a database using GP 
combined with association rule algorithm. It takes rules generated by the 
association rule algorithm as initial individual of GP. The learning speed of 
GP is improved by the combined algorithm. To verify the effectiveness of the 
proposed method, we apply it to the meningoencephalitis diagnosis activity 
data in a hospital. We got domain expert’s comments on our results. We 
discuss the result of proposed method with prior ones. 



73.1 Introduction 

Various techniques have been proposed for rule discovery using classification 
learning. In general, the learning speed of a system using genetic program- 
ming (GP) [73.1] is slow. However, a learning system which can acquire struc- 
tural knowledge by adjusting to the environment can be constructed, because 
GP’s chromosome expression is tree structure, and the structure is evaluated 
by fitness value for the environment. 

On the other hand, there is the Apriori algorithm [73.2], a rule generating 
technique for large databases. This is an algorithm for generation of asso- 
ciation rules. The Apriori algorithm uses two indices for rule construction: 
a support value and a confidence value. Depending on the setting of each 
index threshold, the search space can be reduced. However, it is possible that 
an unexpected rule cannot be extracted by reducing the range of the search 
space. Moreover, the load of the expert who analyzes the rule increases when 
there are a lot of association rule candidates, and it is a possible that it be- 
comes difficult to search for a useful rule. Some experience is necessary to set 
an effective threshold. 

Both techniques have advantages and disadvantages as above. In this 
paper, we propose an extended genetic programming using apriori algorithm 
for rule discovery. By using the combined rule generation learning method. 
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it is expected to construct a system which can search for high accurate rules 
in large databases. The purpose of this research is achieving high forecast 
accuracy by small number of rules. 



73.2 Genetic Programming 

Genetic programming (GP) is a learning method based on the natural theory 
of evolution, and the flow of the algorithm is similar to genetic algorithm 
(GA). The difference between GP and GA is that GP has extended its chro- 
mosome to allow structural expression using function nodes and terminal 
nodes. [73.1] In this paper, the tree structure is used to express the decision 
tree. 

The decision tree construction by GP follows the following procedures. 

1. An initial population is generated from a random grammar of the function 
nodes and the terminal nodes defined for each problem domain. 

2. The fitness value, which relates to the problem solving ability, for each 
individual of the GP population is calculated. 

3. The next generation is generated by genetic operations. 

a) The individual is copied according to the fitness value (reproduction) . 

b) A new individual is generated by intersection (crossover) . 

c) A new individual is generated by random change (mutation). 

4. If the termination condition is met, then the process ends. Otherwise, 
the process repeats from the calculation of fitness value in step 2. 

Generally, there is no method of adequately controlling the growth of the 
tree, because GP does not evaluate the size of the tree. Therefore, during the 
search process the tree may become overly deep and complex, or may settle to 
a too simple tree. The technique by which GP defines an effective partial tree 
is proposed. The approache is automatic function definition (or Automatically 
Defined Function: ADF), and this is achieved by adding the gene expression 
for the function definition to normal GP [73.4]. By implementing ADF, a more 
compact program can be produced, and the number of generation cycles can 
be reduced. More than one gene expression of ADF can be defined in one 
individual. 

One example of our GP expression is shown following. (See Figure 73.1) 
In Figure 73.1, decision tree is expressed in the form similar to LISP-code. 
GP-TREE expresses one individual of GP, and GP-TREE is composed of the 
ADF definition part and the main tree part. “RPB” defines main GP tree. 
Both “ADFO” and “ADFl” defined as each ADF tree. “IFLTE”, “IFEQ” are 
function nodes. These functions requires four arguments(in following exam- 
ple, we use argl,arg2,arg5,arg4). The definitions of them are following. 

(IFLTE argl, arg2, arg3, arg4) if argl is less than or equal to (<) arg2 then 
evaluate arg3, else then evaluate arg4 
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{:GP-TREE 

(:ADF0 

(IFEQ D "F" P N)) 

(:ADF1 

(C)) 

(:RPB 

(IFLTE A'T" 

(IFEQ B "T" 

(IFEQCF'NP) ADFO) P))) 



A='T" 

B="T" 

C="F'': N 
C=‘T'': P 

B="F" 
D="F'': P 
D='T': N 
A=T": P 



Fig. 73.1. Expression of GP’s Chromosome (The left side is an individual expres- 
sion of LISP-code and the right side is rewritten to the decision tree expression.) 



(IFEQ argl, arg2^ argS, arg4) if argl is equal to(=) arg2 then evaluate argS, 
else then evaluate arg4. 

A, B, C and D express the attributes in database. “T” and “F” express 
attribute value, and “N” and “P” express class name. 



73.3 Approach of Proposed Combined Learning 

To make up for the advantage and the disadvantages of the Apriori algorithm 
and GP, we propose a rule discovery technique which combines GP with the 
Apriori algorithm. By combining each technique, the search of high accurate 
rules from a large database is expected. An outline of our proposed technique 
is shown in Figure 73.2. 




Fig. 73.2. Flow Chart of Approach of Proposed Combined Learning 
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The following steps are proposed for the rule discovery technique. 

1. First, the Apriori algorithm generates the association rule. 

2. Next, the generated association rules are converted into decision trees 
which are taken in as initial individuals of GP. The decision trees are 
trained by GP learning. 

3. The final decision tree is converted into classification rules. 

This allows effective schema to be contained in the initial individuals of 
GP. As a result, it is expected to improve the GP’s learning speed and its 
classification accuracy. However, when GP is used for multi-value classifica- 
tion, the learning speed of GP may become slow due to increasing the number 
of definition nodes. Therefore, it is difficult to apply the proposed technique 
to multi-value classification. 

For conversion from the association rule into decision trees, we use the 
following procedures. 

1. For the first process, the route of the decision tree is constructed, assu- 
ming the conditions of the association rule as the attribute-based tests 
of the decision tree. 

2. In the next process, the conclusions of the association rule is appended 
on the terminal node of this route. 

3. Finally, the class value of the terminal nodes which are not defined by 
the association rule are assigned by randamly choosing from the terminal 
nodes set. 

In conversion from the association rule to the decision tree, a rule which 
contains class attribute in the conclusion part is selected. One decision tree 
is converted based on one association rule. A too simple decision tree is ge- 
nerated by conversion, but the decision tree of high accuracy is not necessary 
to GP’s initial individuals, because of GP learning. The conversion does not 
make the amount of the calculation increase because it is simple conversion. 
For conversion from the GP’s decision tree to the classification rule, we use 
the process proposed by Quinlan [73.5]. 



73.4 Apply to Rule Discovery from Database 

We applied the proposed technique for the meningoencephalitis diagnosis 
data sets. This database was donated by S.Tsumoto[73.6j. We applied the pro- 
posed technique for “find factors important for diagnosis (DIAG2) to judge 
bacteria or virus” . We obtained following results of decision tree and rules 
generated by ADF-GP. In the proposed method, we took the association rule 
generated by Apriori algorithm as initial individuals of GP. We used 70 data 
for training, 140 data for test. 70 data was extracted at random. We studied 
these data by using the normal GP, and tuned of the GP parameter before 
experiment. 
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We defined some expressions. “A eq B” is express that attribute(A) is 
equal to attribute(B) if its attribute is discrete value. “A && B” represents 
to connect each part(A,B) of rules by ”and”. The left side of express 
conditions of rule, and the right side of express conclusion of rule (or 
class name). 

The section 73.4.1 shows the results using ADF-GP only. The section 
73.4.2 shows the results using proposed technique. 

73.4.1 ADF-GP Only 

The following rules are generated with ADF-GP. The generated rules are 
composed by the categorical attributes. 

--- generated rules --- 
rulel : 



(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"normal 


) 


kk 


(SEX 


eq "M 


) 


&& (RISK eq 
rule2 


"n 


) -> 


VIRUS 
















(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"normal 


) 


kk 


(SEX 


eq "M 


) 


&& (RISK eq 
rules 


"P 


) -> 


BACTERIA 
















(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"normal 


) 


kk 


(SEX 


eq "F 


) 


&& (RISK eq 
rule4 


"n 


) -> 


VIRUS 
















(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"normal 


) 


kk 


(SEX 


eq "F 


) 


&& (RISK eq 
rules 


"P 


) -> 


BACTERIA 
















(EEG_F0CUS 


eq 


"+") 


kk (CT_FIND 


eq 


"abnormal 


') 


kk (SEX eq 


F") 


kk (RISK eq 
rules : 


"n 


) -> 


VIRUS 
















(EEG_F0CUS 


eq 


"+") 


kk (CT_FIND 


eq 


"abnormal 


') 


kk (SEX eq 


F") 


kk (RISK eq 
rule7 


"P 


) -> 


BACTERIA 
















(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"abnormal 


') 


kk (SEX eq 


M") 


-> BACTERIA 
rules : 






















(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"abnormal 


') 


-> BACTERIA 




rule9 






















(EEG_F0CUS 


eq 




kk (CT_FIND 


eq 


"normal 


) 


-> 


VIRUS 





To examine the availability and the accuracy of the generated rule, the size 
of the rule, the use frequency and the wrong classification frequency (wrong 
classification rate) to all data, the classification class by rules are shown in 
Table(73.1). In the table, the rule 6 is not used for all data. The rules (1 and 
3) with high availability show low wrong classification rates. Other rules have 
high wrong classification rate independent of availability. 

To examine the classification accuracy of the generated rule set, each 
classification distribution to all data are shown in Table(73.2). The table 
shows that small number of data could not classify VIRUS and BAGTERIA 
correctly. 



73.4.2 Proposed Technique (Association Rules -|- ADF-GP) 

The following rules are generated with proposed technique. The generated 
rules are composed by the continuous value attributes. 
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Table 73.1. Evaluation on test data by each rules (ADF-GP only) 



Rule 


Size 


Used 


Wrong 






1 


4 


33 


4 


( 12.12 %) 


VIRUS. 


2 


4 


6 


2 


( 33.33 %) 


BACTERIA. 


3 


4 


36 


3 


( 8.33 %) 


VIRUS. 


4 


4 


2 


0 


( 0.00 %) 


BACTERIA. 


5 


4 


7 


0 


( 0.00 %) 


VIRUS. 


6 


4 


0 


0 


( 0.00 %) 


BACTERIA. 


7 


3 


5 


1 


( 20.00 %) 


BACTERIA. 


8 


2 


27 


8 


( 29.63 %) 


BACTERIA. 


9 


2 


24 


6 


( 25.00 %) 


VIRUS. 



Table 73.2. Evaluation on test data by error distribution(ADF-GP only) 



(a) 


(b) 


•«— classified as 


87 


11 


(a):class VIRUS 


13 


29 


(b):class BACTERIA 



total hits= 116 



--- generated rules --- 
rulel : 



(Cell.Poly <= 221) -> VIRUS 
rule2 : 

(Cell.Poly > 221) && CEEG_F0CUS <= 200) -> BECTERIA 
rules : 

(Cell.Poly > 221) && CEEG_F0CUS > 200) kk (GCS <= 121) 
-> BECTERIA 



rule4 : 

(Cell.Poly > 221) 
kk (SEIZURE == 0 ) 
rules : 

(Cell.Poly > 221) 
kk (SEIZURE != 0 ) 



kk (EEG_F0CUS > 200) kk (GCS > 121) 
-> VIRUS 

kk (EEG_F0CUS > 200) kk (GCS > 121) 
-> BACTERIA 



The performance of the generated rule are shown in Table(73.3). In the 
table, the rule 3, 4 and 5 are not used for all data. The rule 1 and 2 have 
high availability and low wrong classification rates. 



Table 73.3. Evaluation on test data by each rules (proposed method) 



Rule 


Size 


Used 


Wrong 






1 


1 


108 


10 


( 9.26 %) 


: VIRUS. 


2 


2 


32 


0 


( 0.00 %) 


: BACTERIA. 


3 


3 


0 


0 


( 0.00 %) 


: BACTERIA. 


4 


4 


0 


0 


( 0.00 %) 


: VIRUS. 


5 


4 


0 


0 


( 0.00 %) 


: BACTERIA. 



To examine the classification accuracy of the generated rule set, each 
classification distribution to all data is shown in Table(73.4). The table shows 
that some rules classified BACTERIA as VIRUS by mistake, but almost rules 
have correct classification ability. 
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Table 73.4. Evaluation on test data by error distribution (proposed method) 



(a) 


(b) 


•«— classified as 


98 


0 


(a):class VIRUS 


10 


32 


(b):class BACTERIA 


total hits= 


130 



73.4.3 Discussion for the Results 

In the results, the proposed method shows higher accuracy than ADF-GP, 
and dataset can be expressed using more small number of rules. 

The proposed method does not have pruning rules operation except for 
GP operations. GP operation is a kind of statistical operation. Thus, some- 
times GP operation can obtain interesting rules, but otherwise, the result 
contains meaningless rules. For such problems, GP technique which contain 
the pruning operation are proposed [73.7], and it makes possible to build the 
pruning techniques in our proposed technique. Moreover, it is also possible in 
the experiment to remove meaningless rules by using the threshold in avai- 
lability. When the experimental result is evaluated and cleaned by domain 
expert after experiment, the load for domain expert depends on the number 
of rules of results. In the proposed technique, the number of rules of results 
can be reduced compared with only ADF-GP. 

We got following comments on these results from domain expert(S. Tsu- 
moto) . 

Totally, the results obtained by ADF-GP are more interesting than 
the proposed methods. The results obtained by the proposed techni- 
que are very reasonable, but I do not see the meaning of “EEG_FOGUS 
> 200” and “GGS > 121”. Please let me know what the authors mean 
by that. Please show me the results for other problems. 

The purpose of this research is to achieve high forecast accuracy by small 
number of rules. This purpose is not as same as expert’s interest on the 
experiment result. Because expert’s interesting rules were obtained by the 
normal ADF-GP, expert’s interesting rule can be obtained by the proposed 
technique by increasing the GP effect. 



73.5 Conclusions 

In this paper, we proposed the rule discovery technique from the database 
using genetic programming combined with association rule algorithms. To 
verify the validity of the proposed method, we applied it to the meningoen- 
cephalitis diagnosis activity data in a hospital, and discussed the results of 
proposed method and normal ADF-GP with domain expert. As a result, an 
improvement of rules’ accuracy was seen, and proposed method can express 
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dataset by the small number of rules. It can be concluded that the propo- 
sed method is an effective method to the improvement of the rules’ accuracy 
and can save the number of rules for the rule discovery problem. Though the 
comments of domain expert, using only ADF-GP method can be obtained 
more interesting rules than using proposed method. 

In the future, we will research the following 4 topics. The first topic is 
to apply the method to other verifications. We already applied proposed 
method for other problems [73.8] [73.9]. We need to discuss the problem 
suitable for proposed method through the applications to various problems. 
The second topic is to discuss the conversion algorithm from the association 
rule to a decision tree with high accuracy. The third topic is to extend the 
proposed method to multi- value classification problems. It is necessary for 
this problem to suppress increasing the number of definition nodes and to 
establish measures against the decrease at the learning speed by increasing 
nodes. The fourth topic is to obtain more interesting rules such as ADF-GP 
only. 
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74.1 Introduction 



The importance of extracting knowledge from databases is well established in the 
domain of medical science. Recent advances in hospital automation have intro- 
duced databases that store vast amounts of information on patients’ case histories. 
Since experiments involving human patients are not always possible, efficient and 
flexible data mining is expected to facilitate new medical discoveries from avail- 
able data. 

The objective of this paper was to analyze a meningoencephalitis data set, the 
test data at the JSAI KDD Challenge 2001 workshop [1], and to help organize the 
search for new information on this disease. The method of analysis is the cascade 
model developed by the author. Section 2 briefly introduces the model. The com- 
putation procedure for the challenge problem and the resulting rules are shown in 
Section 3. We also indicate the usefulness of visual inspection of data guided by 
the obtained rules. The last section discusses possible improvements in data min- 
ing using the cascade model. 



74.2 The Cascade Model 



The model examines an itemset lattice where an [attribute: value] pair is employed 
as an item to constitute itemsets. Links in the lattice are selected and expressed as 
rules [2]. Figure 1.1 shows a typical example of a link and its expressed rule. 
Here, the problem contains five attributes: A-E, each of which takes (y, n) values. 
The itemset at the upper end of the link has item [A: y], and another item [B:Dy] 
is added along the link. Items of the other attributes are called veiled items. The 
tables attached to the nodes show the frequencies of veiled items. 
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A:y 



A: y, B: y 





y 


n 


WSS 


9 

CT 


B 


60 ( 9.6) 


40(14.4) 


24.0 


.24 


C 


50 ( 12.5) 


50(12.5) 


25.0 


.25 


D 


60 ( 9.6) 


40(14.4) 


24.0 


.24 


F 


40 ( 14.4) 


60 ( 9.6) 


24.0 


.24 





BSS 


B 


9.60 


C 


0.00 


D 


6.67 


E 


5.40 



IF [B: y] added on fA: y] 
THEN [D: y; E; n] 

Cases: 100 A 60 

[D: y] 60% A 93%, BSS = 6.67 

fE: n] 60% A 90%, BSS = 5.40 





y 


n 


IFSS 




B 


60 (0.00) 


0 (0.00) 


0.00 


.000 


C 


30 (7.50) 


30 (7.50) 


15.00 


.250 


D 


56 (0.25) 


4 (3.48) 


3.73 


.062 


E 


6 (4.86) 


54 (0.54) 


5.40 


.090 



Fig. 1.1. A sample link, its rule expression and properties of the veiled items. 



In order to evaluate the strength of a rule, the within-group sum of squares 
(VF55) and between-group sum of squares (BSS) are defined by the following for- 
mulae [3, 4], 



WSS, = 
BSS, = 









( 1 ) 

( 2 ) 



where i designates an attribute; the superscripts U and L indicate the upper and 
lower nodes, respectively; n shows the number of supporting cases of a node; and 
p,(a) is the probability of obtaining the value a for attribute i. 

Figure 1.1 shows the WSS, and BSS, values along with their sample variances. A 
large BSS, value is evidence of a strong interaction between the added item and 
attribute i. The textbox at the right in Figure 1.1 shows the derived rule. The added 
item [B: y] appears as the main condition in the LHS, while the items in the upper 
node are placed at the end of the LHS as preconditions. When a veiled attribute 
has a large BSS, value, one of its items is placed in the RHS of a rule. The method 
of selecting items from a veiled attribute was described in [3]. 

We can control the appearance of attributes in the LHS by restricting the attrib- 
utes in the itemset node. On the other hand, the attributes in the RHS can be se- 
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lected by setting the minimum BSS. value of a rule (min-BSS) for each attribute. It 
is not necessary for items in the RHS of a rule to reside in the lattice. This is in 
sharp contrast to association rule miners, which require the itemset, [A: y; B: y; D; 
y; E: n] to derive the rule in Figure 1.1. These characteristics of the cascade model 
make it possible to detect rules efficiently [5]. 



74.3 Results and Discussion 



74.3.1 Computation by DISCAS 



The data set provided for the JSAI KDD Challenge 2001 consists of records on 
140 meningoencephalitis patients [1]. Each record contains 40 attribute values. All 
numerical data were categorized as shown in Table 1.1. For example, the attribute 
“COLD” was converted into one of the three items: “cold<0”, “0<cold<5” or 
“cold>5”. 



We analyzed the data set using DISCAS software (version 2.1), which was de- 
veloped by the author. Factors affecting diagnosis, detection of bacteria or virus, 
and prognosis were examined by changing the RHS attribute. The results are 
shown separately in the following subsections. All calculations were done using a 
600-MHz Pentium III PC with 256 MB of memory. The pruning conditions were 
set to minsup = 0.01 and thres = 0.05 (see reference [5] for the meaning of these 
parameters). DISCAS generated a lattice within 2.5 minutes that contained about 
50,000 nodes for diagnosis and culture detection problems, while it took 10 min- 



utes of calculation to construct a lattice with 
120,000 nodes for the prognosis problem. 

We assume that the significance of a 
rule is roughly proportional to its BSS 
value, so we show the rules with large BSS 
values. However, we have to be careful in 
the selection of rules from computation re- 
sults, since sets of rules that share many of 
the same supporting instances should be 
considered different expressions of the 
same phenomenon. Let us think of a clas- 
sification problem for [Class: pos]. If we 
obtain the following three rules, they show 
the existence of a data segment sharing 
items [A: y], [B: n], [C: y], and [Class: 
pos]. We believe that the strongest rule 
should be selected from equivalent expres- 
sions using the BSS criterion, although 
other expressions are often useful as re- 
lated knowledge. 



Table 1.1. Categorization of numerical 
attributes 



Attribute 


Splitting values 


AGE 


20, 30, 40, 50 


COLD 


0,5 


HEADACHE 


0, 3, 6, 9 


LEVER 


0, 3, 6, 10 


NAUSEA 


0,3 


LOC 


0, 1 


SEIZURE 


0 


BT 


36, 37, 38, 39 


STIEF 


0, 1, 2, 3 


GCS 


14 


WBC 


5000, 6000, 8000, 10000 


CRP 


0, 1,3 


ESR 


0, 10, 25 


CSE_CELL 


50, 125, 300, 750 


Cell_Poly 


8, 20, 50, 300 


Cell_Mono 


50, 125, 300, 750 


CSF_PRO 


0, 60, 100, 200 


CSF.GLU 


40, 50, 60, 70 
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IF [A: y] 

added on [B: n] 
THEN [Class: pos] 
THEN [C: y] 



IF [C: y] 

added on [B: n] 
THEN [Class: pos] 
THEN [A: y] 



IF [B: n] 

added on [A: y] 
THEN [Class: pos] 
THEN [C: y] 



74.3.2 Diagnosis 

The dataset guide indicates that differential diagnosis is important in determining 
whether the disease is bacterial or viral meningitis. We analyzed the dataset set- 
ting DIAG2 (the grouped attribute of DIAG) as the RHS attribute. All attributes 
were employed as LHS attributes, except DIAG and the 1 1 attributes whose val- 
ues were obtained after the initial diagnosis. They were CULTURE, 
CULT_EIND, THERAPY2, C_COURSE, C_COURSE (Grouped), CSF_CELL3, 
CSE_CELL7, C_COURSE, COURSE(Grouped), RISK, and RISK(Grouped). The 
top two rules in the first rule set are shown below. These are the strongest rules 
leading to the diagnosis of bacterial and viral meningitis, respectively. 

IF [Cell_Poly > 300] added on [] 

THEN Diag2 = BACTERIA 30.0% -> 100.0%; 

IF [20 < Cell_Poly = 50] added on [CSF_CELL > 750] 
THEN Diag2 = VIRUS 38.6% -> 100.0%; 

Since the cascade model has a search range that is limited in the propositional cal- 
culus domain, it cannot draw upon an 
expert’s knowledge in comparing 
Cell_Poly and Cell_Mono directly. 

However, we can easily reach the same 
conclusion if we inspect the 
scattergram in Fig. 1.2 referencing the 
constraint: CSF_CELL = Cell_Poly + 

Cell_Mono. 

Another analysis is put into practice 
omitting the attributes: Cell_Poly and 
Cell_Mono, and the resulting rules are 
expected to lead to new knowledge, 
viewed from another point. The rules 
for which BSS>3.0 are illustrated in 
Table 1.2. 




Fig. 1.2. Scatter plot: Cell_poly 
vs CSF_Cell 
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Table 1.2. Strong rules obtained for DIAG2 



No 


Main condition 


Preconditions 


Change in Diag2 distribution ^ 
(bacteria virus) 


BSS 


1 


[CRP>3] 


[FOCAL: -] 


(27 78)/ 105^ (15 3)/ 18 


5.98 


If no precondition is applied, (42 98) / 140 (19 5) / 24, S55=5.80.| 




[CT_FIND: abnormal] 


|[EEG_FOCUS: -]| 


(32 72)/ 105^ (19 8)/ 27 


4.23 1 


2 


The percentage of [SEX: M] also changes 62%-^93%. 
If no precondition is applied, (42 98) / 140 (23 16) / 39. 

If [FEVER: 0, EEG_FOCUS: -] is the precondition, (7 17) / 24 ^ (5 0) / 5. 


3 


[CT_FIND: abnormal] 


[NAUSEA=0, 
LOC_DAT: -] 


(14 37)/ 51^ (7 0)/ 7 


3.68 



^Values in parentheses show the number of bacterial and viral cases, while the value after 
the slash gives the number of all instances for all attribute values. 



Table 1.3. Strong rules obtained for DIAG 



No 


Main 

condition 


Preconditions 


Changes in Diag distribution 
(Abscess Bacteria Bacte(E) TB(E) 
Virus(E) Virus) 


BSS 


1 


[CelLPoly > 
300] 


[] 


(9 24 8 1 30 68) / 140^ (2 21 2 0 0 0) / 30; 
Increase in Bacteria, decrease in Virus 


8.88 


If [FOCAL: -] is the precondition, (4 19 4 0 14 64) / 105-^ (1 16 4 0 0 0) / 21. 
If [EEG_FOCUS: -] is the precondition, (8 19 4 1 U 61) / 104^ (2 16 4 0 0 0) / 22. 


2 


[EEG FOCUS 
:+] 


[SEX: F, 

CT_FIND: normal] 


(0 6 0 013 30)/ 49-^ (0 1 00100) / 11; 
Increase in Vir(E), decrease in Virus 


4.35 


IF [SEX: E] is the precondition, (1 6 0 0 19 32) / 58 ^ (0 1 0 0 15 2) / 18. 
IF [CT_FIND: normal] is the precondition, (0 15 4 0 19 63) / 101 ^ (0 3 3 0 B 5) / 24. 

IF no precondition is applied, (9 24 8 1 30 68) / 140 ^ (1 5 4 0 19 2) / 36. 


3 


[LOG DAT: 
+] 


[EEG_FOCUS: -] 


(8 194 1 11 61)/ 104^ (4 8 3 1 8 2)/26; 
Decrease in Virus 


4.27 


4 


[CT_FIND: 

abnormal] 


[EEG_FOCUS: -] 


(8 1941 116i)/104^(8_7 3 1 53)/27 
Decrease in Virus, increase in Abscess 


3.95 


5 


[FOCAL: +] 


[CRP<0, 

CT_FIND: normal, 
EEG_FOCUS: -] 


(0 3 00 5 34)/ 42 -4 (0 1 00 5 0)/ 6 
Decrease in Virus, increase in Virus(E) 


3.52 


IF [CRP<0, EEG FOCUS: -] is the precondition, 
(2 5 1 1 8 35) / 52^ (220 1 7 1) /13. 


6 


[CSF_PRO<0] 


[EEG_FOCUS: -, 
CelLPoly < 8] 


(5 0 0 0 5 18) / 28^ (5 0 0 0 1 0) / 6; 
Increase in Abscess, decrease in Virus 


2.53 


Major changes in other attributes: 
21% ^ 83%, [CSF_CELL < 5( 


[FOCAL = +] 29% ^ 83%, [CT_FIND: abnormal] 
3] 39% ^ 100%, [CelLMono < 50] 39% ^ 100%. 
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Figure 1.3 shows the changes in the 
Diag2 distributions by Spotfire [6]. 
Axes were selected following Rule 1 
in Table 1.2. We can see clear increase 
in bacteria ratio at the top left pie 
chart, and the distribution changes in 
other charts seem reasonable. 

Table 1.3 depicts the rules when we 
employed DIAG as the RHS attribute. 
The rules for which BSS>3.0 are 
shown. When no rule appeared to dis- 
criminate a class, the strongest rule 
related to the class was also included, 
although no rules were found indicat- 
ing BACTE(E). Six classes appeared 
in the distribution list, of which the 
first four were bacterial and the last 
two were viral. The class showing a 
remarkable change is underlined. 
Rules 4 and 6 in Table 1.3 indicate 
the characteristic segment of an ab- 
scess, related to the CT abnormality 
mentioned in [1]. 



Pie Chart of Diagnosis 




Fig. 1.3. Visualization of Rule 1 in Table 
1.2 Black: bacteria; white: virus. 



74.3.3 Detection of Bacteria or Virus 



Table 1.4 shows the rules with BSS>2.Q when CULT_EIND was used as the RHS 
attribute. The attributes for the LHS were the same as those in the diagnosis prob- 
lem. All BSS values are relatively 
low. In fact, the distribution change 
shown in Figure 1.4 seems to be un- 
natural for Rule 1 in Table 1.4. All 
rules except rule 6 in Table 1.4 de- 
serve no further discussion after S 
such visual inspections are applied. 

For the problem of specific culture 
findings, we could detect no strong 
rules. The only exception was Rule 8 
in Table 1.4, with which all species 

found were herpes. . 

Fig. 1.4. Visualization of Rule 1 in Table 

1.4. Black: found; white: not found 
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Table 1.4. Strong rules obtained for CULT_FIND 



No 


Main condition 


Preconditions 


Change in CULT_E1ND 
distribution (Ealse True) 


555 


1 


[BT>39.0] 


[125<Cell_Mono<300] 


(18 5)7 23^(0 5)7 5 


3.06 


2 


[38.0<BT<39.0] 


[SEX: M, COLD<0, 
5000<WBC<6000] 


(18 6)7 24^ (0 4)7 4 


2.25 


|1F [5000<WBC<6000, CRP<0] is the precondition, (31 10) / 41 -> (2 6) / 8, 555=2.05. | 


3 


[HEADACHE<0] 


[EOCAL: -, 
Cell_Mono<50] 


(14 5)7 19^ (0 4)74 


2.17 


4 


[100<CSF_PRO<200] 


[KERNIG: 1] 


(25 5)7 30^(0 4)7 4 


2.78 


5 


[LOC>l] 


[SEX: F, COLD<0] 


(27 7)7 34^ (1 5)7 6 


2.36 


6 


[AGE>50] 


[SEX: E, CRP<0] 


(24 8)7 32^(0 4)7 4 


2.25 


7 


[EOCAL: +] 


[60<CSF_PRO<100] 


(33 19)743^ (1 5)76 


2.17 


Always accompanied by [CT_EIND: abnormal].] 


8 


[EEG_FOCUS: +] 


[36.0<BT<37.0] 


(14 4)7 18^ (0 3)7 3 
All are herpes. 


1.81 



74.3.4 Prognosis 

COURSE(Grouped) was employed as the RHS attribute and the resulting rules 
with BSS>2.0 are shown in Table 1.5. Only CSF_CELL3 and C_COURSE are 
omitted from the set of LHS attributes. Even if we use C_COURSE as the RHS 
attribute, we cannot find a specific course except Rules 3 and 9 in Table 1.5. Rule 
3 appears interesting, as it indicates the existence of a data cluster. 



Table 1.5. Strong rules obtained for CULT_FIND 



No 


Main condition 


Preconditions 


Change in 

COURSE(Grouped) distri- 
bution: (neg pos) 


555 


1 


[EOCAL: +] 


[Diag2: VIRUS] 


(81 17)798^ (9 11)720 


2.84 


2 


[NAUSEA>3] 


[ST1EP=2, CULT_F1ND: F] 


(27 9)7 36^ (0 5)7 5 


2.81 


3 


[THERAPY2: ARA_A] 


[SEX: F] 


(48 10)758^ (0 4)74 


2.74 


Always accompanied by [DIAG: VIRUS(E), BT<36.0, LOC_DAT = +, 
EOCAL = +, EEG_EOCUS = +]. 3 aphasia and 1 amnesia. 


4 


[ONSET: SUBACUTE] 


[CT_F1ND: normal, 
EEG_FOCUS: -] 


(69 8)777^(0 3)7 3 


2.41 


5 


[EOCAL: +] 


[Diag2: VIRUS, CRP=0] 


(50 12)7 62^ (5 8)7 13 


2.31 


6 


[CSF_GLU>70] 


[0<CSF_PRO<60] 


(32 5)7 37^ (0 3)7 3 


2.24 


7 


[NAUSEA>3] 


[COLD<0, LOC DAT: -, 
300<Cell_Mono<750] 


(16 3)719^(0 3)7 3 


2.13 


8 


[LOC>l] 


[CULT_F1ND: F] 


(89 18)7 107^ (2 5)7 7 


2.09 


9 


[CSE_GLU^0] 


[SEX: M, 

THERAPY2: no_therapy] 


(24 5)7 29^ (0 3)7 3 
all dead 


2.05 


10 


[EEVER>10] 


[40<CSF_GLU<50] 


(26 5)7 31^ (1 4)7 5 


2.04 


11 


[DIAG: VIRUS(E)] 


[SEX: E, CULT_F1ND: E] 


(39 8)7 47^(5 7)7 12 


2.05 


12 


[100<CSF_PRO<200] 


[ST1FF=2] 


(37 13)7 50^ (5 9)7 14 


2.05 
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The number of supporting instances for Rule 9 
is only 3, but they are all dead. They are shown in 
the bottom right pie chart in Figure 1.5. If we omit 
the precondition [Therapy2: no_therapy] during 
the visual inspection, we can recognize 5 positives 
(all dead) among 12 patients in the same pie chart. 



74.4 Concluding Remarks 



The cascade model can provide many strong rules 
effectively. Sometimes sets of related rules are 
found by using different BSS values and precondi- 
tions. We then have to refer them to expert 
evaluation to determine their importance and va- 
lidity. Of special interest is whether the BSS val- 
ues are consistent with the importance of the rules. 

Although the resulting rules are powerful, sev- 
eral improvements are expected to better express 
them. The first is the optimization of rules. If we 
move the split values of categorizations, and 

add/remove precondition clauses, then the resulting rules will surely be improved. 
The second is the presentation of related rules. If they are expressed in a group 
sorted by their BSS values, analysis will be easier. The visualization was also 
proved to be useful if variables in rule conditions are used as axes. Analysts can 
often detect reasonable/nonsense rules by the visual inspection. 



SEX 



of 



Fig. 1.5. Visualization 
Rule 9 in Table 5. 

Only no_therapy are shown. 
Black: pos; white: neg. 
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This paper describes an application of two rough sets based systems, namely 
GDT-RS and RSBR respectively, for mining if-then rules in a meningitis 
dataset. GDT-RS (Generalized Distribution Table and Rough Set) is a soft 
hybrid induction system, and RSBR (Rough Sets with Boolean Reasoning) 
is used for discretization of real valued attributes as a preprocessing step 
realized before the GDT-RS starts. We argue that discretization of continuous 
valued attributes is an important pre-processing step in the rule discovery 
process. We illustrate the quality of rules discovered by GDT-RS is strongly 
affected by the result of discretization. 



75.1 Introduction 

Rough set theory constitutes a sound basis for Knowledge Discovery and Data 
Mining. It offers useful tools to discover patterns hidden in data in many as- 
pects [75.8, 75.9]. It can be used in different phases of knowledge discovery 
process such as attribute selection, attribute extraction, data reduction, de- 
cision rule generation, and pattern extraction (templates, association rules). 

This paper describes an application of two rough sets based systems, na- 
mely GDT-RS and RSBR respectively, for mining if-then rules in a meningitis 
dataset. The core of the rule discovery process is GDT-RS that is a soft hy- 
brid induction system for discovering classification rules from databases with 
uncertainty and incompleteness [75.10, 75.2, 75.3]. The system is based on 
the combination of Generalization Distribution Table (GDT) and the rough 
set methodology. A GDT is a table in which the probabilistic relationships 
between concepts and instances over discrete domains are represented. The 
GDT provides a probabilistic basis for evaluating the strength of a rule. Fur- 
thermore, the rough set methodology is used to find minimal relative reducts 
from the set of rules with larger strengths. 

Furthermore, in the pre-processing before using GDT-RS, a system called 
RSBR is used for discretization of real valued attributes. The system is based 
on the combination of the rough set method and Boolean reasoning proposed 
by Nguyen and Skowron [75.6, 75.11]. A variant of the rule selection criteria 
in GDT-RS is used in RSBR. Thus, the process of the discretization of real 
valued attributes does not only mean to find the minimal relative reduct, but 
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also considers the effect of the discretized attribute values on the performance 
of our induction system GDT-RS. 

We argue that discretization of continuous valued attributes is an impor- 
tant pre-processing step in the rule discovery process. Rules induced without 
discretization are of low quality because they will not recognize many new 
objects. We illustrate the quality of rules discovered by GDT-RS is strongly 
affected by the result of discretization. 



75.2 Rule Discovery by GDT-RS 

GDT-RS is a soft hybrid induction system for discovering classification rules 
from databases with uncertain and incomplete data [75.10, 75.2]. The system 
is based on a hybridization of Generalization Distribution Table (GDT) and 
the Rough Set methodology. The main features of GDT-RS are the following: 

— Biases for search control can be selected in a flexible way. Background 
knowledge can be used as a bias to control the initiation of GDT and in 
the rule discovery process. 

— The rule discovery process is oriented toward inducing rules with high 
quality of classification of unseen instances. The rule uncertainty, including 
the ability to predict unseen instances, can be explicitly represented by the 
rule strength. 

— A minimal set of rules with the minimal (semi-minimal) description length, 
having large strength, and covering of all instances can be generated. 

— Interesting rules can be induced by selecting a discovery target and class 
transformation . 

In [75.10, 75.3], we illustrated the first two features. This paper discusses 
the last two features of the GDT-RS. 



75.2.1 GDT and Rule Strength 

Any GDT consists of three components: possible instances, possible gene- 
ralizations of instances, and probabilistic relationships between possible in- 
stances and possible generalizations. Here the possible instances are all possi- 
ble combinations of attribute values in a database; the possible generalizations 
for instances are all possible cases of generalization for all possible instances; 
the probabilistic relationships between possible instances and possible genera- 
lizations, represented by entries Gij of a given GDT, are defined by means of a 
probabilistic distribution describing the strength of the relationship between 
any possible instance and any possible generalization. The prior distribution 
is assumed to be uniform, if background knowledge is not available. Thus, it 
is defined by Eq. (75.1) 
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G,,=p{PI,\PGi) 



1 

^PGi 



if Plj e PGi 



0 otherwise 



(75.1) 



where Plj is the j-th possible instance, PGi is the i-th possible generalization, 
and NpCi is the number of the possible instances satisfying the i-th possible 
generalization, that is. 



NpGi = tlfc 

ke{l\ PGi[i]=*} 



(75.2) 



where PGi[l] is the value of the /-th attribute in the possible generalization 
PGi- Uk is the number of different attribute values in attribute k. which 
specifies a wild card, denotes the generalization for instances^. Certainly we 
have Gij = 1 for any i. 

Let us recall some basic notions for rule discovery from databases repre- 
sented by decision tables in rough set theory. A decision table is a tuple 
T = ({ 7 , A, C, D), where [/ is a nonempty finite set of objects called the uni- 
verse, A is a nonempty finite set of primitive attributes, and G,D G A are 
two subsets of attributes that are called condition and decision attributes, 
respectively [75.8, 75.9]. By IND{B) we denote the indiscernibility relation 
defined by B C A, [a;]/ 7 V£)(B) denotes the indiscernibility (equivalence) class 
defined by x, and U/B the set of all indiscernibility classes of IND{B). 

In our approach, the rules are expressed in the following form: 

P ^ Q with S 

that is, “if P then Q with the strength S'” where P denotes a conjunction 
of conditions (i.e. P G G), Q denotes a concept that the rule describes (i.e. 
Q G D), S is a “measure of strength” of the rule. Furthermore, S consists 
of three parts: s{P), accuracy, and coverage, where s{P) is the strength of 
the generalization P (i.e. the condition of the rule), the accuracy of the rule 
is measured by a noise rate function: r{P Q), coverage denotes how many 
instances are covered by the rule. If some instances covered by the rule also 
belong to another class, the coverage is a set: {number of instances belonging 
to the class, number of instances belonging to another class}. 

The strength of a given rule reflects the incompleteness and uncertainty 
in the process of rule inducing influenced both by unseen instances and noise. 
The strength of the generalization P = PG is given by Eq. (75.3) under that 
assumption that the prior distribution is uniform 

s{P) = '^p{Ph\P) = card{[x\iND(P)) x ^ (75.3) 



where card([x]/jv£)(p)) is the number of observed instances satisfying the 
generalization P. The strength of the generalization P represents explicitly 



^ For simplicity, we would like to omit the wild card in some places in this paper. 
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the prediction for unseen instances since possible instances are considered. 
On the other hand, the noise rate is given by Eq. (75.4) 

card{[x]iND(p) H [x]jjvg(Q)) 

^ card{[xUo(p)) ^ ’ 

where card([a;] 77 VD(Q)) is the number of all instances from the class Q within 
the instances satisfying the generalization P. It shows the quality of classifi- 
cation measured by the number of instances satisfying the generalization P 
which cannot be classified into class Q. The user can specify an allowed noise 
level as a threshold value. Thus, the rule candidates with the larger noise 
level than a given threshold value will be deleted. 



75.2.2 A Searching Algorithm for Optimal Set of Rules 

We now describe an idea of a searching algorithm for a set of rules developed 
in [75.2]. We use a sample database shown in Table 75.1 to illustrate the idea. 
Let Tnoise be a threshold value. 



Table 75.1. A sample database 





a 


b 


c 


d 


ul 


flo 


60 


Cl 


y 


u2 


flo 


61 


Cl 


y 


u3 


flo 


bo 


Cl 


y 


u4 


fll 


bi 


Co 


n 


u5 


flo 


bo 


Cl 


11 


u6 


flo 


62 


Cl 


11 


u7 


fll 


61 


Cl 


y 



Step 1. Create one or more GDTs. 

If prior background knowledge is not available the prior distribution of a 
generalization is calculated using Eq. (75.1) and Eq. (75.2). 

Step 2. Consider the indiscernibility classes with respect to the condition at- 
tribute set C (such as u\, u^, and U 5 in the sample database of Table 75.1) 
as one instance, called a compound instance (such as = [ui]/jvD(a, 6 ,c) 
in the following table). Then the probabilities of generalizations can be 
calculated correctly. 
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Step 3. For any compound instance v! (such as the instance Ui in the above 
table), let d{u') be the set of the decision classes to which the instances in 
u' belong. Furthermore, let = {x € U : d{x) = t;} be the decision class 
corresponding to the decision value v. The rate can be calculated by 
Eq. (75.4). If there exist a u G d{u') such that ry{u') = min{rvi{u')\v' G 
d(u')} < Tnoise then we let the compound instance u' to point to the 
decision class corresponding to v. If does not exist any v G d{u') such that 
ry{u') < Tnoise, we treat the compound instance u' as a contradictory 
one, and set the decision class of u' to ^-{uncertain) . For example. 





a b c 


d 


Ui{ui,Us,U5) 


ao bo Cl 


T 



Let U be the set of all the instances except the contradictory ones. 

Step 4- Select one instance u from U . Using the idea of discernibility matrix, 
create a discernibility vector (that is, the row or the column with respect 
to u in the discernibility matrix) for u. For example, the discernibility 
vector for instance U 2 ■ aobiCi is as follows: 



w'i(T) 


U2{y) 


Ui{n) uo{n) ur{y) 


U2{y) b 


0 


a,c b 0 



Step 5. Compute all the so called local relative reducts for the instance u by 
using the discernibility function. For example, from instance U 2 -aobiCi, 
we obtain two reducts {a, b} and {b, c}: 

/t(u2) = (6) a T a (a V c) a (6) a T = (a A 6) V (& A c). 

Step 6. Construct rules from the local reducts for the instance u, and revise 
the strength of each rule using Eqs. (75.3) and (75.4). For example, the 
following rules are acquired 

{oo^i} — > y with S' = 1 X - = 0.5, and 

{6iCi} ^ y with S = 2 x - = 1 for the instance U 2 '.aobiCi. 

Step 1. Select the best rules from the rules (for u) obtained in Step 6 accor- 
ding to its priority [75.10]. For example, the rule “{6iCi} ^ y” is selected 

for the instance U 2 '.aobiCi because it matches more instances than the rule 
“{ao&i}^y”. 

Step 8. U = U — {u}. If U yf 0, then go back to Step 4. Otherwise, go to 
Step 9. 

Step 9. If any rule selected in Step 7 is covering exactly one instance then 
STOP, otherwise, select a minimal set of rules covering all instances in 
the decision table. 

The following table shows the result for the sample database shown in 
Table 75.1. 
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u 


rules 


s(P) 


accuracy 


coverage 


Ui,U3,U^ 


T 

o 


0.25 


0.67 


| 2 , 1 } 


U2,ur 


bi y 


1 


1 


2 


U4 


Co ^ n 


0.17 


1 


1 


Uq 


62 ^ n 


0.25 


1 


1 



One can see that the discovered rule set is a minimal one having large 
strength and covering of all instances. Furthermore, the searching algorithm 
can be conveniently used to discover a rule set with respect to an interesting 
class (or a subset of classes) selected by the user as a discovery target. Thus, 
by using class selection/transformation, and combining with the some prepro- 
cessing steps such as discretization, we can obtain more interesting results. 



75.3 Discretization Based on RSBR 

Discretization of continuous valued attributes is an important pre-processing 
step in the process for rule discovery in the databases with mixed type of data 
including continuous valued attributes. In order to solve the discretization 
issues, we have developed a discretization system called RSBR that is based 
on hybridization of rough sets and Boolean reasoning proposed in [75.6]. 

A great effort has been made (see e.g. [75.5, 75.1, 75.4, 75.7]) to find ef- 
fective methods for discretization of continuous valued attributes. We may 
obtain different results by using different discretization methods. The results 
of discretization affect directly the quality of the discovered rules. Some of 
discretization methods totally ignore the effect of the discretized attribute 
values on the performance of the induction algorithm. RSBR combines di- 
scretization of continuous valued attributes and classification together. In the 
process of the discretization of continuous valued attributes we should also 
take into account the effect of the discretization on the performance of our 
induction system GDT-RS. 

Roughly speaking, the basic concepts of the discretization based on RSBR 
can be summarized as follows: 

— Discretization of a decision table T = ([/, AU {d}), where Va = [va,Wa) is 
an interval of real values taken by attribute a is a searching process for a 
partition Pa of Va for any a € A satisfying some optimization criteria (like 
minimal partition) preserving some discernibility constraints [75.6]. 

— Any partition of Va is defined by a sequence of the so-called cuts vi < V 2 < 
... <Vk from Va. 

— Any family of partitions {Pa}aeA can be identified with a set of cuts. 



75.4 Application in Meningitis Data Mining 

This section shows the results of mining in a meningitis dataset by using 
cooperatively GDT-RS and RSBR. 
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In the meningitis dataset, 19 of 38 attributes are continuous valued attri- 
butes that must be discretized by RSBR before rule induction by GDT-RS. 
Since the quality of rules discovered by GDT-RS is strongly affected by the 
result of discretization of continuous valued attributes, we need to do the 
discretization of continuous valued attributes carefully. 

Furthermore, in this experiment, for each decision attribute with multi- 
class, we used two different modes of cooperatively using GDT-RS and RSBR: 

1. All classes in a decision attribute are considered simultaneously when 
using RSBR for discretization. 

2. Focus on an interesting class selected by a user as positive class (-I-) 
and other classes are considered as negative class (-). The GDT-RS and 
RSBR are cooperatively used to find the rules with respect to the focused 
positive class. After that, a class with respect to negative class is selected 
as a new interesting positive class, and then the RSBR and GDT-RS are 
cooperatively used again. Repeat this process until all interesting classes 
are selected as positive class. 

Here we show an interesting result. That is, finding factors important for 
predicting prognosis (GOURSE and G_GOURSE). 

First we consider all classes when discretization. The following 2 of 11 
reasonable rules are interesting ones. 

n,i : FEVER{> 8) A BT{< 37.1) ^ CULTURE{~) 
with coverage = {13, 2}, accuracy = 86%. 

ri .2 : FOCAL{+) A CT _FIN D{normal) CULTURE(-) 

with coverage = {12, 2}, accuracy = 85%. 

Then we focus on an interesting class, that is, GULTURE(-). The following 
2 of 26 reasonable rules are interesting ones. 

r 2 .i ■■ COLD{< 9) A BT{> 37.1) A LOC.DAT{-)/\ CelLMono{< 429) 

^ CULTURE{—) with coverage = 31, accuracy = 1. 

r 2.2 : COLD(< 9) A LOC.DAT{~) A CelLPoly{> 32)A CSF_PRO{< 93) 

^ CULTURE{—) with coverage = 30, accuracy = 1. 

According to a medical doctor opinion, all these rules (ri i,ri 2 and 
^’2.15^2,2) are reasonable, but the rules, r 2 .i,r 2 . 2 , which are learned by fo- 
cusing on an interesting class, are are much better ones. 

This example shows that more interesting rules can be generated by selec- 
ting an interesting discovery target because the better result of discretization 
is obtained. 



75.5 Conclusion 

We have presented an application of two rough sets based systems, GDT-RS 
and RSBR, for mining if-then rules from a meningitis dataset. The experi- 
mental results illustrate that the quality of rules discovered by GDT-RS is 
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strongly affected by the results of discretization of continuous valued attribu- 
tes. We need to do the discretization of continuous valued attributes carefully. 
Using cooperatively RSBR and GDT-RS is a good way for rule discovery in 
the datasets with mixed type of attributes and multi-class. 
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